`skfolio.distribution`.VineCopula#

class skfolio.distribution.VineCopula(fit_marginals=True, marginal_candidates=None, copula_candidates=None, max_depth=4, log_transform=False, central_assets=None, dependence_method=KENDALL_TAU, selection_criterion=AIC, independence_level=0.05, n_jobs=None, random_state=None)[source]#

Regular Vine Copula Estimator.

This model first fits the best univariate distribution for each asset, transforming the data to uniform marginals via the fitted CDFs. Then, it constructs a regular vine copula by sequentially selecting the best bivariate copula from a list of candidates for each edge in the vine using a maximum spanning tree algorithm based on a given dependence measure [1].

Regular vines captures complex, fat-tailed dependencies and tail co-movements between asset returns.

It also supports conditional sampling, enabling stress testing and scenario analysis by generating samples under specified conditions.

Moreover, by marking some assets as central, this novel implementation is able to capture clustered or C-like dependency structures, allowing for more nuanced representation of hierarchical relationships among assets and improving conditional sampling and stress testing.

Parameters:

fit_marginalsbool, default=True

Whether to fit marginal distributions to each asset before constructing the vine. If True, the data will be transformed to uniform marginals using the fitted CDFs.

marginal_candidateslist[BaseUnivariateDist], optional

Candidate univariate distribution estimators to fit the marginals. If None, defaults to [Gaussian(), StudentT(), JohnsonSU()].

copula_candidateslist[BaseBivariateCopula], optional

Candidate bivariate copula estimators. If None, defaults to [GaussianCopula(), StudentTCopula(), ClaytonCopula(), GumbelCopula(), JoeCopula()].

max_depthint or None, default=4

Maximum vine depth (truncated level). Must be greater than 1. None means that no truncation is applied. The default is 4.

log_transformbool | dict[str, bool] | array-like of shape (n_assets, ), default=False

If True, the simple returns provided as input will be transformed to log returns before fitting the vine copula. That is, each return R is transformed via r = log(1+R). After sampling, the generated log returns are converted back to simple returns using R = exp(r) - 1.

If a boolean is provided, it is applied to each asset. If a dictionary is provided, its (key/value) pair must be the (asset name/boolean) and the input X of the fit method must be a DataFrame with the assets names in columns.

central_assetsarray-like of asset names or asset positions, optional

Assets that should be centrally placed during vine construction. If None, no asset is forced to the center. If an array-like of integer is provided, its values must be asset positions. If an array-like of string is provided, its values must be asset names and the input X of the fit method must be a DataFrame with the assets names in columns. Assets marked as central are forced to occupy central positions in the vine, leading to C-like or clustered structure. This is needed for conditional sampling, where the conditioning assets should be central nodes.

For example:

If only asset 1 is marked as central, it will be connected to all other
assets in the first tree (yielding a C-like structure for the initial tree), with subsequent trees following the standard R-vine pattern.
If asset 1 and asset 2 are marked as central, they will be connected
together and the remaining assets will connect to either asset 1 or asset 2 (forming a clustered structure in the initial trees). In the next tree, the edge between asset 1 and asset 2 becomes the central node, with subsequent trees following the standard R-vine structure.
This logic extends naturally to more than two central assets.

dependence_methodDependenceMethod, default=DependenceMethod.KENDALL_TAU

The dependence measure used to compute edge weights for the MST. Possible values are:

KENDALL_TAU
MUTUAL_INFORMATION
WASSERSTEIN_DISTANCE

selection_criterionSelectionCriterion, default=SelectionCriterion.AIC

The criterion used for univariate and copula selection. Possible values are:

SelectionCriterion.AIC : Akaike Information Criterion
SelectionCriterion.BIC : Bayesian Information Criterion

independence_levelfloat, default=0.05

Significance level used for the Kendall tau independence test during copula fitting. If the p-value exceeds this threshold, the null hypothesis of independence is accepted, and the pair copula is modeled using the IndependentCopula() class.

n_jobsint, optional

The number of jobs to run in parallel for fit of all estimators. The value -1 means using all processors. The default (None) means 1 unless in a joblib.parallel_backend context.

random_stateint, RandomState instance or None, default=None

Seed or random state to ensure reproducibility.

Attributes:

trees_list[Tree]: List of constructed vine trees.
marginal_distributions_list[BaseUnivariateDist]: List of fitted marginal distributions (if fit_marginals is True).
n_features_in_int: Number of assets seen during fit.
feature_names_in_ndarray of shape (n_features_in_,): Names of assets seen during fit. Defined only when X has assets names that are all strings.

References

[1]

“Selecting and estimating regular vine copulae and application to financial returns” Dißmann, Brechmann, Czado, and Kurowicka (2013).

[2]

“Growing simplified vine copula trees: improving Dißmann’s algorithm” Krausa and Czado (2017).

[3]

“Pair-copula constructions of multiple dependence” Aas, Czado, Frigessi, Bakken (2009).

[4]

“Pair-Copula Constructions for Financial Applications: A Review” Aas and Czado (2016).

[5]

“Conditional copula simulation for systemic risk stress testing” Brechmann, Hendrich, Czado (2013)

Examples

>>> from skfolio.datasets import load_factors_dataset
>>> from skfolio.preprocessing import prices_to_returns
>>> from skfolio.distribution import VineCopula
>>>
>>> # Load historical prices and convert them to returns
>>> prices = load_factors_dataset()
>>> X = prices_to_returns(prices)
>>>
>>> # Instanciate the VineCopula model
>>> vine = VineCopula()
>>> # Fit the model
>>> vine.fit(X)
>>> # Display the vine trees and fitted copulas
>>> vine.display_vine()
>>> # Log-likelihood, AIC and BIC
>>> vine.score(X)
>>> vine.aic(X)
>>> vine.bic(X)
>>>
>>> # Generate 10 samples from the fitted vine copula
>>> samples = vine.sample(n_samples=10)
>>>
>>> # Set QUAL, SIZE and MTUM as central
>>> vine = VineCopula(central_assets=["QUAL", "SIZE", "MTUM"])
>>> vine.fit(X)
>>> # Sample by conditioning on QUAL and SIZE returns
>>> samples = vine.sample(
...    n_samples=4,
...    conditioning={
...        "QUAL": [-0.1, -0.2, -0.3, -0.4],
...        "SIZE": -0.2,
...        "MTUM": (None, -0.3) # MTUM sampled between -Inf and -30%
...    },
...)
>>> # Plots Scatter matrix of sampled returns vs historical X
>>> fig = vine.plot_scatter_matrix(X=X)
>>> fig.show()
>>>
>>> # Plots univariate distributions of sampled returns vs historical X
>>> fig = vine.plot_marginal_distributions(X=X)
>>> fig.show()

Methods

`aic`(X)	Compute the Akaike Information Criterion (AIC) for the model given data X.
`bic`(X)	Compute the Bayesian Information Criterion (BIC) for the model given data X.
`clear_cache`([clear_count])	Clear cached intermediate results in the vine trees.
`display_vine`()	Display the vine trees and fitted copulas.
`fit`(X[, y])	Fit the Vine Copula model to the data.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`plot_marginal_distributions`([X, ...])	Plot overlaid marginal distributions.
`plot_scatter_matrix`([X, conditioning, ...])	Plot the vine copula scatter matrix by generating samples from the fitted distribution model and comparing it versus the empirical distribution of `X` if provided.
`sample`([n_samples, conditioning])	Generate random samples from the vine copula.
`score`(X[, y])	Compute the total log-likelihood under the model.
`score_samples`(X)	Compute the log-likelihood of each sample (log-pdf) under the model.
`set_params`(**params)	Set the parameters of this estimator.

aic(X)#

Compute the Akaike Information Criterion (AIC) for the model given data X.

The AIC is defined as:

\[\mathrm{AIC} = -2 \, \log L \;+\; 2 k,\]

where

\(\log L\) is the total log-likelihood
\(k\) is the number of parameters in the model

A lower AIC value indicates a better trade-off between model fit and complexity.

Parameters:

Xarray-like of shape (n_observations, n_features): The input data on which to compute the AIC.

Returns:

aicfloat: The AIC of the fitted model on the given data.

Notes

In practice, both AIC and BIC measure the trade-off between model fit and complexity, but BIC tends to prefer simpler models for large \(n\) because of the \(\ln(n)\) term.

References

[1]

“A new look at the statistical model identification”, Akaike (1974).

bic(X)#

Compute the Bayesian Information Criterion (BIC) for the model given data X.

The BIC is defined as:

\[\mathrm{BIC} = -2 \, \log L \;+\; k \,\ln(n),\]

where

\(\log L\) is the (maximized) total log-likelihood
\(k\) is the number of parameters in the model
\(n\) is the number of observations

A lower BIC value suggests a better fit while imposing a stronger penalty for model complexity than the AIC.

Parameters:

Xarray-like of shape (n_observations, n_features): The input data on which to compute the BIC.

Returns:

bicfloat: The BIC of the fitted model on the given data.

Notes

In practice, both AIC and BIC measure the trade-off between model fit and complexity, but BIC tends to prefer simpler models for large \(n\) because of the \(\ln(n)\) term.

References

[1]

“Estimating the dimension of a model”, Schwarz, G. (1978).

clear_cache(clear_count=True)[source]#: Clear cached intermediate results in the vine trees.

display_vine()[source]#: Display the vine trees and fitted copulas. Prints the structure of each tree and the details of each edge.

fit(X, y=None)[source]#

Fit the Vine Copula model to the data.

Parameters:

Xarray-like of shape (n_observations, n_assets): Price returns of the assets.
yIgnored: Not used, present for API consistency by convention.

Returns:

selfVineCopula: The fitted VineCopula instance.

Raises:

ValueError: If the number of assets is less than or equal to 2, or if max_depth <= 1.

property fitted_repr#: String representation of the fitted Vine Copula.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

property n_params#: Number of model parameters.

plot_marginal_distributions(X=None, conditioning=None, subset=None, n_samples=500, percentile_cutoff=None, title='Vine Copula Marginal Distributions')[source]#

Plot overlaid marginal distributions.

Parameters:

Xarray-like of shape (n_samples, n_assets), optional

Historical data where each column corresponds to an asset.

conditioningdict[int | str, float | tuple[float, float] | array-like], optional

A dictionary specifying conditioning information for one or more assets. The dictionary keys are asset indices or names, and the values define how the samples are conditioned for that asset. Three types of conditioning values are supported:

Fixed value (float): If a float is provided, all samples are generated under the condition that the asset takes exactly that value.
Bounds (tuple of two floats): If a tuple (min_value, max_value) is provided, samples are generated under the condition that the asset’s value falls within the specified bounds. Use -np.Inf for no lower bound or np.Inf for no upper bound.
Array-like (1D array): If an array-like of length n_samples is provided, each sample is conditioned on the corresponding value in the array for that asset.

When using conditional sampling, it is recommended that the assets you condition on are set as central during the vine copula construction. This can be specified via the central_assets parameter in the vine copula instantiation.

subsetlist[int | str], optional

Indices or names of assets to include in the plot. If None, all assets are used.

n_samplesint, default=500

Number of samples used to control the density and readability of the plot. If X is provided and contains more than n_samples rows, a random subsample of size n_samples is selected. Conversely, if X has fewer rows than n_samples, the value is adjusted to match the number of rows in X to ensure balanced visualization.

percentile_cutofffloat, default=None

Percentile cutoff for tail truncation (percentile), in percent. If a float p is provided, the distribution support is truncated at the p-th and (100 - p)-th percentiles. If None, no truncation is applied (uses full min/max of returns).

titlestr, default=”Vine Copula Marginal Distributions”

The title for the plot.

Returns:

figplotly.graph_objects.Figure: A figure with overlaid univariate distributions for each asset.

plot_scatter_matrix(X=None, conditioning=None, n_samples=1000, title='Scatter Matrix')#

Plot the vine copula scatter matrix by generating samples from the fitted distribution model and comparing it versus the empirical distribution of X if provided.

Parameters:

Xarray-like of shape (n_samples, n_assets), optional

If provided, it is used to plot the empirical scatter matrix for comparison versus the vine copula scatter matrix.

conditioningdict[int | str, float | tuple[float, float] | array-like], optional

A dictionary specifying conditioning information for one or more assets. The dictionary keys are asset indices or names, and the values define how the samples are conditioned for that asset. Three types of conditioning values are supported:

Fixed value (float): If a float is provided, all samples are generated under the condition that the asset takes exactly that value.
Bounds (tuple of two floats): If a tuple (min_value, max_value) is provided, samples are generated under the condition that the asset’s value falls within the specified bounds. Use -np.Inf for no lower bound or np.Inf for no upper bound.
Array-like (1D array): If an array-like of length n_samples is provided, each sample is conditioned on the corresponding value in the array for that asset.

n_samplesint, default=1000

Number of samples used to control the density and readability of the plot. If X is provided and contains more than n_samples rows, a random subsample of size n_samples is selected. Conversely, if X has fewer rows than n_samples, the value is adjusted to match the number of rows in X to ensure balanced visualization.

titlestr, default=”Scatter Matrix”

The title for the plot.

Returns:

figplotly.graph_objects.Figure: A figure object containing the scatter matrix.

sample(n_samples=1, conditioning=None)[source]#

Generate random samples from the vine copula.

This method generates n_samples from the fitted vine copula model. The resulting samples represent multivariate observations drawn according to the dependence structure captured by the vine copula.

Parameters:

n_samplesint, default=1

Number of samples to generate.

conditioningdict[int | str, float | tuple[float, float] | array-like], optional

A dictionary specifying conditioning information for one or more assets. The dictionary keys are asset indices or names, and the values define how the samples are conditioned for that asset. Three types of conditioning values are supported:

Fixed value (float): If a float is provided, all samples are generated under the condition that the asset takes exactly that value.
Bounds (tuple of two floats): If a tuple (min_value, max_value) is provided, samples are generated under the condition that the asset’s value falls within the specified bounds. Use -np.Inf for no lower bound or np.Inf for no upper bound.
Array-like (1D array): If an array-like of length n_samples is provided, each sample is conditioned on the corresponding value in the array for that asset.

Important: When using conditional sampling, it is recommended that the assets you condition on are set as central during the vine copula construction. This can be specified via the central_assets parameter in the vine copula instantiation.

Returns:

Xarray-like of shape (n_samples, n_assets): A two-dimensional array where each row is a multivariate observation sampled from the vine copula.

score(X, y=None)#

Compute the total log-likelihood under the model.

Parameters:

Xarray-like of shape (n_observations, n_features): An array of data points for which the total log-likelihood is computed.
yNone: Ignored. Provided for compatibility with scikit-learn’s API.

Returns:

logprobfloat: The total log-likelihood (sum of log-pdf values).

score_samples(X)[source]#

Compute the log-likelihood of each sample (log-pdf) under the model.

Parameters:

Xarray-like of shape (n_observations, n_assets): Price returns of the assets.

Returns:

densityndarray of shape (n_observations,): The log-likelihood of each sample under the fitted vine copula.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

skfolio.distribution.VineCopula#

`skfolio.distribution`.VineCopula#