skfolio.distribution
.VineCopula#
- class skfolio.distribution.VineCopula(fit_marginals=True, marginal_candidates=None, copula_candidates=None, max_depth=4, log_transform=False, central_assets=None, dependence_method=KENDALL_TAU, selection_criterion=AIC, independence_level=0.05, n_jobs=None, random_state=None)[source]#
Regular Vine Copula Estimator.
This model first fits the best univariate distribution for each asset, transforming the data to uniform marginals via the fitted CDFs. Then, it constructs a regular vine copula by sequentially selecting the best bivariate copula from a list of candidates for each edge in the vine using a maximum spanning tree algorithm based on a given dependence measure [1].
Regular vines captures complex, fat-tailed dependencies and tail co-movements between asset returns.
It also supports conditional sampling, enabling stress testing and scenario analysis by generating samples under specified conditions.
Moreover, by marking some assets as central, this novel implementation is able to capture clustered or C-like dependency structures, allowing for more nuanced representation of hierarchical relationships among assets and improving conditional sampling and stress testing.
- Parameters:
- fit_marginalsbool, default=True
Whether to fit marginal distributions to each asset before constructing the vine. If True, the data will be transformed to uniform marginals using the fitted CDFs.
- marginal_candidateslist[BaseUnivariateDist], optional
Candidate univariate distribution estimators to fit the marginals. If None, defaults to
[Gaussian(), StudentT(), JohnsonSU()]
.- copula_candidateslist[BaseBivariateCopula], optional
Candidate bivariate copula estimators. If None, defaults to
[GaussianCopula(), StudentTCopula(), ClaytonCopula(), GumbelCopula(), JoeCopula()]
.- max_depthint or None, default=4
Maximum vine depth (truncated level). Must be greater than 1.
None
means that no truncation is applied. The default is 4.- log_transformbool | dict[str, bool] | array-like of shape (n_assets, ), default=False
If True, the simple returns provided as input will be transformed to log returns before fitting the vine copula. That is, each return R is transformed via r = log(1+R). After sampling, the generated log returns are converted back to simple returns using R = exp(r) - 1.
If a boolean is provided, it is applied to each asset. If a dictionary is provided, its (key/value) pair must be the (asset name/boolean) and the input
X
of thefit
method must be a DataFrame with the assets names in columns.- central_assetsarray-like of asset names or asset positions, optional
Assets that should be centrally placed during vine construction. If None, no asset is forced to the center. If an array-like of integer is provided, its values must be asset positions. If an array-like of string is provided, its values must be asset names and the input
X
of thefit
method must be a DataFrame with the assets names in columns. Assets marked as central are forced to occupy central positions in the vine, leading to C-like or clustered structure. This is needed for conditional sampling, where the conditioning assets should be central nodes.For example:
- If only asset 1 is marked as central, it will be connected to all other
assets in the first tree (yielding a C-like structure for the initial tree), with subsequent trees following the standard R-vine pattern.
- If asset 1 and asset 2 are marked as central, they will be connected
together and the remaining assets will connect to either asset 1 or asset 2 (forming a clustered structure in the initial trees). In the next tree, the edge between asset 1 and asset 2 becomes the central node, with subsequent trees following the standard R-vine structure.
This logic extends naturally to more than two central assets.
- dependence_methodDependenceMethod, default=DependenceMethod.KENDALL_TAU
The dependence measure used to compute edge weights for the MST. Possible values are:
KENDALL_TAU
MUTUAL_INFORMATION
WASSERSTEIN_DISTANCE
- selection_criterionSelectionCriterion, default=SelectionCriterion.AIC
The criterion used for univariate and copula selection. Possible values are:
SelectionCriterion.AIC : Akaike Information Criterion
SelectionCriterion.BIC : Bayesian Information Criterion
- independence_levelfloat, default=0.05
Significance level used for the Kendall tau independence test during copula fitting. If the p-value exceeds this threshold, the null hypothesis of independence is accepted, and the pair copula is modeled using the
IndependentCopula()
class.- n_jobsint, optional
The number of jobs to run in parallel for
fit
of allestimators
. The value-1
means using all processors. The default (None
) means 1 unless in ajoblib.parallel_backend
context.- random_stateint, RandomState instance or None, default=None
Seed or random state to ensure reproducibility.
- Attributes:
- trees_list[Tree]
List of constructed vine trees.
- marginal_distributions_list[BaseUnivariateDist]
List of fitted marginal distributions (if fit_marginals is True).
- n_features_in_int
Number of assets seen during
fit
.- feature_names_in_ndarray of shape (
n_features_in_
,) Names of assets seen during
fit
. Defined only whenX
has assets names that are all strings.
References
[1]“Selecting and estimating regular vine copulae and application to financial returns” Dißmann, Brechmann, Czado, and Kurowicka (2013).
[2]“Growing simplified vine copula trees: improving Dißmann’s algorithm” Krausa and Czado (2017).
[3]“Pair-copula constructions of multiple dependence” Aas, Czado, Frigessi, Bakken (2009).
[4]“Pair-Copula Constructions for Financial Applications: A Review” Aas and Czado (2016).
[5]“Conditional copula simulation for systemic risk stress testing” Brechmann, Hendrich, Czado (2013)
Examples
>>> from skfolio.datasets import load_factors_dataset >>> from skfolio.preprocessing import prices_to_returns >>> from skfolio.distribution import VineCopula >>> >>> # Load historical prices and convert them to returns >>> prices = load_factors_dataset() >>> X = prices_to_returns(prices) >>> >>> # Instanciate the VineCopula model >>> vine = VineCopula() >>> # Fit the model >>> vine.fit(X) >>> # Display the vine trees and fitted copulas >>> vine.display_vine() >>> # Log-likelihood, AIC and BIC >>> vine.score(X) >>> vine.aic(X) >>> vine.bic(X) >>> >>> # Generate 10 samples from the fitted vine copula >>> samples = vine.sample(n_samples=10) >>> >>> # Set QUAL, SIZE and MTUM as central >>> vine = VineCopula(central_assets=["QUAL", "SIZE", "MTUM"]) >>> vine.fit(X) >>> # Sample by conditioning on QUAL and SIZE returns >>> samples = vine.sample( ... n_samples=4, ... conditioning={ ... "QUAL": [-0.1, -0.2, -0.3, -0.4], ... "SIZE": -0.2, ... "MTUM": (None, -0.3) # MTUM sampled between -Inf and -30% ... }, ...) >>> # Plots Scatter matrix of sampled returns vs historical X >>> fig = vine.plot_scatter_matrix(X=X) >>> fig.show() >>> >>> # Plots univariate distributions of sampled returns vs historical X >>> fig = vine.plot_marginal_distributions(X=X) >>> fig.show()
Methods
aic
(X)Compute the Akaike Information Criterion (AIC) for the model given data X.
bic
(X)Compute the Bayesian Information Criterion (BIC) for the model given data X.
clear_cache
([clear_count])Clear cached intermediate results in the vine trees.
Display the vine trees and fitted copulas.
fit
(X[, y])Fit the Vine Copula model to the data.
Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
plot_marginal_distributions
([X, ...])Plot overlaid marginal distributions.
plot_scatter_matrix
([X, conditioning, ...])Plot the vine copula scatter matrix by generating samples from the fitted distribution model and comparing it versus the empirical distribution of
X
if provided.sample
([n_samples, conditioning])Generate random samples from the vine copula.
score
(X[, y])Compute the total log-likelihood under the model.
Compute the log-likelihood of each sample (log-pdf) under the model.
set_params
(**params)Set the parameters of this estimator.
- aic(X)#
Compute the Akaike Information Criterion (AIC) for the model given data X.
The AIC is defined as:
\[\mathrm{AIC} = -2 \, \log L \;+\; 2 k,\]where
\(\log L\) is the total log-likelihood
\(k\) is the number of parameters in the model
A lower AIC value indicates a better trade-off between model fit and complexity.
- Parameters:
- Xarray-like of shape (n_observations, n_features)
The input data on which to compute the AIC.
- Returns:
- aicfloat
The AIC of the fitted model on the given data.
Notes
In practice, both AIC and BIC measure the trade-off between model fit and complexity, but BIC tends to prefer simpler models for large \(n\) because of the \(\ln(n)\) term.
References
[1]“A new look at the statistical model identification”, Akaike (1974).
- bic(X)#
Compute the Bayesian Information Criterion (BIC) for the model given data X.
The BIC is defined as:
\[\mathrm{BIC} = -2 \, \log L \;+\; k \,\ln(n),\]where
\(\log L\) is the (maximized) total log-likelihood
\(k\) is the number of parameters in the model
\(n\) is the number of observations
A lower BIC value suggests a better fit while imposing a stronger penalty for model complexity than the AIC.
- Parameters:
- Xarray-like of shape (n_observations, n_features)
The input data on which to compute the BIC.
- Returns:
- bicfloat
The BIC of the fitted model on the given data.
Notes
In practice, both AIC and BIC measure the trade-off between model fit and complexity, but BIC tends to prefer simpler models for large \(n\) because of the \(\ln(n)\) term.
References
[1]“Estimating the dimension of a model”, Schwarz, G. (1978).
- display_vine()[source]#
Display the vine trees and fitted copulas. Prints the structure of each tree and the details of each edge.
- fit(X, y=None)[source]#
Fit the Vine Copula model to the data.
- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Price returns of the assets.
- yIgnored
Not used, present for API consistency by convention.
- Returns:
- selfVineCopula
The fitted VineCopula instance.
- Raises:
- ValueError
If the number of assets is less than or equal to 2, or if max_depth <= 1.
- property fitted_repr#
String representation of the fitted Vine Copula.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- property n_params#
Number of model parameters.
- plot_marginal_distributions(X=None, conditioning=None, subset=None, n_samples=500, title='Vine Copula Marginal Distributions')[source]#
Plot overlaid marginal distributions.
- Parameters:
- Xarray-like of shape (n_samples, n_assets), optional
Historical data where each column corresponds to an asset.
- conditioningdict[int | str, float | tuple[float, float] | array-like], optional
A dictionary specifying conditioning information for one or more assets. The dictionary keys are asset indices or names, and the values define how the samples are conditioned for that asset. Three types of conditioning values are supported:
Fixed value (float): If a float is provided, all samples are generated under the condition that the asset takes exactly that value.
Bounds (tuple of two floats): If a tuple
(min_value, max_value)
is provided, samples are generated under the condition that the asset’s value falls within the specified bounds. Use-np.Inf
for no lower bound ornp.Inf
for no upper bound.Array-like (1D array): If an array-like of length
n_samples
is provided, each sample is conditioned on the corresponding value in the array for that asset.
Important: When using conditional sampling, it is recommended that the assets you condition on are set as central during the vine copula construction. This can be specified via the
central_assets
parameter in the vine copula instantiation.- subsetlist[int | str], optional
Indices or names of assets to include in the plot. If None, all assets are used.
- n_samplesint, default=500
Number of samples used to control the density and readability of the plot. If
X
is provided and contains more thann_samples
rows, a random subsample of sizen_samples
is selected. Conversely, ifX
has fewer rows thann_samples
, the value is adjusted to match the number of rows inX
to ensure balanced visualization.- titlestr, default=”Vine Copula Marginal Distributions”
The title for the plot.
- Returns:
- figplotly.graph_objects.Figure
A figure with overlaid univariate distributions for each asset.
- plot_scatter_matrix(X=None, conditioning=None, n_samples=1000, title='Scatter Matrix')#
Plot the vine copula scatter matrix by generating samples from the fitted distribution model and comparing it versus the empirical distribution of
X
if provided.- Parameters:
- Xarray-like of shape (n_samples, n_assets), optional
If provided, it is used to plot the empirical scatter matrix for comparison versus the vine copula scatter matrix.
- conditioningdict[int | str, float | tuple[float, float] | array-like], optional
A dictionary specifying conditioning information for one or more assets. The dictionary keys are asset indices or names, and the values define how the samples are conditioned for that asset. Three types of conditioning values are supported:
Fixed value (float): If a float is provided, all samples are generated under the condition that the asset takes exactly that value.
Bounds (tuple of two floats): If a tuple
(min_value, max_value)
is provided, samples are generated under the condition that the asset’s value falls within the specified bounds. Use-np.Inf
for no lower bound ornp.Inf
for no upper bound.Array-like (1D array): If an array-like of length
n_samples
is provided, each sample is conditioned on the corresponding value in the array for that asset.
- n_samplesint, default=1000
Number of samples used to control the density and readability of the plot. If
X
is provided and contains more thann_samples
rows, a random subsample of sizen_samples
is selected. Conversely, ifX
has fewer rows thann_samples
, the value is adjusted to match the number of rows inX
to ensure balanced visualization.- titlestr, default=”Scatter Matrix”
The title for the plot.
- Returns:
- figplotly.graph_objects.Figure
A figure object containing the scatter matrix.
- sample(n_samples=1, conditioning=None)[source]#
Generate random samples from the vine copula.
This method generates
n_samples
from the fitted vine copula model. The resulting samples represent multivariate observations drawn according to the dependence structure captured by the vine copula.- Parameters:
- n_samplesint, default=1
Number of samples to generate.
- conditioningdict[int | str, float | tuple[float, float] | array-like], optional
A dictionary specifying conditioning information for one or more assets. The dictionary keys are asset indices or names, and the values define how the samples are conditioned for that asset. Three types of conditioning values are supported:
Fixed value (float): If a float is provided, all samples are generated under the condition that the asset takes exactly that value.
Bounds (tuple of two floats): If a tuple
(min_value, max_value)
is provided, samples are generated under the condition that the asset’s value falls within the specified bounds. Use-np.Inf
for no lower bound ornp.Inf
for no upper bound.Array-like (1D array): If an array-like of length
n_samples
is provided, each sample is conditioned on the corresponding value in the array for that asset.
Important: When using conditional sampling, it is recommended that the assets you condition on are set as central during the vine copula construction. This can be specified via the
central_assets
parameter in the vine copula instantiation.
- Returns:
- Xarray-like of shape (n_samples, n_assets)
A two-dimensional array where each row is a multivariate observation sampled from the vine copula.
- score(X, y=None)#
Compute the total log-likelihood under the model.
- Parameters:
- Xarray-like of shape (n_observations, n_features)
An array of data points for which the total log-likelihood is computed.
- yNone
Ignored. Provided for compatibility with scikit-learn’s API.
- Returns:
- logprobfloat
The total log-likelihood (sum of log-pdf values).
- score_samples(X)[source]#
Compute the log-likelihood of each sample (log-pdf) under the model.
- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Price returns of the assets.
- Returns:
- densityndarray of shape (n_observations,)
The log-likelihood of each sample under the fitted vine copula.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.