skfolio.moments.ImpliedCovariance#

class skfolio.moments.ImpliedCovariance(prior_covariance_estimator=None, annualized_factor=252.0, window_size=20, linear_regressor=None, volatility_risk_premium_adj=None, nearest=True, higham=False, higham_max_iteration=100)[source]#

Implied Covariance estimator.

For each asset, the implied volatility time series is used to estimate the realised volatility using the non-overlapping log-transformed OLS model [6]:

\[\ln(RV_{t}) = \alpha + \beta_{1} \ln(IV_{t-1}) + \beta_{2} \ln(RV_{t-1}) + \epsilon\]

with \(\alpha\), \(\beta_{1}\) and \(\beta_{2}\) the intercept and coefficients to estimate, \(RV\) the realised volatility, and \(IV\) the implied volatility. The training set uses non-overlapping data of sample size window_size to avoid possible regression errors caused by auto-correlation. The logarithmic transformation of volatilities is used for its better finite sample properties and distribution, which is closer to normality, less skewed and leptokurtic [6].

Alternatively, if volatility_risk_premium_adj is provided, the realised volatility is estimated using:

\[RV_{t} = \frac{IV_{t-1}}{VRPA}\]

with \(VRPA\) the volatility risk premium adjustment.

The final step is the reconstruction of the covariance matrix from the correlation and estimated realised volatilities \(D\):

\[\Sigma = D \ Corr \ D\]

With \(Corr\), the correlation matrix computed from the prior covariance estimator. The default is the EmpiricalCovariance. It can be changed to any covariance estimator using prior_covariance_estimator.

Parameters:
prior_covariance_estimatorBaseCovariance, optional

Covariance estimator to estimate the covariance matrix used for the correlation estimates prior the volatilities update. The default (None) is to use EmpiricalCovariance.

annualized_factorfloat, default=252

Annualized factor (AF) used to covert the implied volatilities into the same frequency as the returns using \(\frac{IV}{\sqrt{AF}}\). The default is 252 which corresponds to daily returns and implied volatility expressed in p.a.

window_sizeint, default=20

Window size used to construct the non-overlapping training set of realised volatilities and implied volatilities used in the regression. The default is 20 observations.

linear_regressorBaseEstimator, optional

Estimator of the linear regression used to estimate the realised volatilities from the implied volatilities. The default is to use the scikit-learn OLS estimator LinearRegression.

volatility_risk_premium_adjfloat | dict[str, float] | array-like of shape (n_assets, ), optional

If provided, instead of using the regression model, the realised volatilities are estimated using:

\[RV_{t} = \frac{IV_{t-1}}{VRPA}\]

with \(VRPA\) the volatility risk premium adjustment.

If a float is provided, it is applied to each asset. If a dictionary is provided, its (key/value) pair must be the (asset name/asset \(VRPA\)) and the input X of the fit method must be a DataFrame with the assets names in columns.

nearestbool, default=True

If this is set to True, the covariance is replaced by the nearest covariance matrix that is positive definite and with a Cholesky decomposition than can be computed. The variance is left unchanged. A covariance matrix that is not positive definite often occurs in high dimensional problems. It can be due to multicollinearity, floating-point inaccuracies, or when the number of observations is smaller than the number of assets. For more details, see cov_nearest. The default is True.

highambool, default=False

If this is set to True, the Higham (2002) algorithm is used to find the nearest PD covariance, otherwise the eigenvalues are clipped to a threshold above zeros (1e-13). The default is False and uses the clipping method as the Higham algorithm can be slow for large datasets.

higham_max_iterationint, default=100

Maximum number of iterations of the Higham (2002) algorithm. The default value is 100.

Attributes:
covariance_ndarray of shape (n_assets, n_assets)

Estimated covariance matrix.

prior_covariance_estimator_BaseEstimator

Fitted prior covariance estimator.

pred_realised_vols_ndarray of shape (n_assets,)

The predicted realised volatilities

linear_regressors_list[BaseEstimator]

The fitted linear regressions.

coefs_ndarray of shape (n_assets, 2)

The coefficients of the log transformed regression model for each asset.

intercepts_ndarray of shape (n_assets,)

The intercepts of the log transformed regression model for each asset.

n_features_in_int

Number of assets seen during fit.

feature_names_in_ndarray of shape (n_features_in_,)

Names of assets seen during fit. Defined only when returns has assets names that are all strings.

Methods

fit(X[, y, implied_vol])

Fit the implied covariance estimator.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

mahalanobis(X_test)

Compute the squared Mahalanobis distance of observations.

score(X_test[, y])

Compute the mean log-likelihood of observations under the estimated model.

set_fit_request(*[, implied_vol])

Configure whether metadata should be requested to be passed to the fit method.

set_params(**params)

Set the parameters of this estimator.

set_score_request(*[, X_test])

Configure whether metadata should be requested to be passed to the score method.

References

[1]

“New evidence on the implied-realized volatility relation”. Christensen & Hansen (2002).

[2]

“The relation between implied and realized volatility”. Christensen & Prabhala (2002).

[3]

“Can implied volatility predict returns on the carry trade?”. Egbers & Swinkels (2015).

[4]

“Volatility and correlation forecasting”. Egbers & Swinkels (2015).

[5]

“Volatility and correlation forecasting”. Andersen, Bollerslev, Christoffersen & Diebol (2006).

[6] (1,2)

“How Well Does Implied Volatility Predict Future Stock Index Returns and Volatility? : A Study of Option-Implied Volatility Derived from OMXS30 Index Options”. Sara Vikberg & Julia Björkman (2020).

fit(X, y=None, implied_vol=None, **fit_params)[source]#

Fit the implied covariance estimator.

Parameters:
Xarray-like of shape (n_observations, n_assets)

Price returns of the assets.

yIgnored

Not used, present for API consistency by convention.

implied_volarray-like of shape (n_observations, n_assets)

Implied volatilities of the assets.

**fit_paramsdict

Parameters to pass to the underlying estimators. Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:
selfImpliedCovariance

Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

mahalanobis(X_test)#

Compute the squared Mahalanobis distance of observations.

The squared Mahalanobis distance of an observation \(r\) is defined as:

\[d^2 = (r - \mu)^T \Sigma^{-1} (r - \mu)\]

where \(\Sigma\) is the estimated covariance matrix (self.covariance_) and \(\mu\) is the estimated mean (self.location_ if available, otherwise zero).

This distance measure accounts for correlations between assets and is useful for:

  • Outlier detection in portfolio returns

  • Risk-adjusted distance calculations

  • Identifying unusual market regimes

Parameters:
X_testarray-like of shape (n_observations, n_assets) or (n_assets,)

Observations for which to compute the squared Mahalanobis distance. Each row represents one observation. If 1D, treated as a single observation. Assets with non-finite fitted variance are excluded from inference. After this asset-level filtering, each row is evaluated using the remaining available values only, covering row-level missing values such as market holidays or pre/post-listing. When rows have different observation patterns, the returned distances follow \(\chi^2\) distributions with different degrees of freedom. Rows with no finite retained observation return NaN.

Returns:
distancesndarray of shape (n_observations,) or float

Squared Mahalanobis distance for each observation. Returns a scalar if input is 1D.

Examples

>>> import numpy as np
>>> from skfolio.moments import EmpiricalCovariance
>>> X = np.random.randn(100, 3)
>>> model = EmpiricalCovariance()
>>> model.fit(X)
>>> distances = model.mahalanobis(X)
>>> # Distances follow approximately chi-squared distribution with n_assets DoF
>>> print(f"Mean distance: {distances.mean():.2f}, Expected: {3:.2f}")
score(X_test, y=None)#

Compute the mean log-likelihood of observations under the estimated model.

Evaluates how well the fitted covariance matrix explains new observations, assuming a multivariate Gaussian distribution. This is useful for:

  • Model selection (comparing different covariance estimators)

  • Cross-validation of covariance estimation methods

  • Assessing goodness-of-fit

The log-likelihood for a single observation \(r\) is:

\[\log p(r | \mu, \Sigma) = -\frac{1}{2} \left[ n \log(2\pi) + \log|\Sigma| + (r - \mu)^T \Sigma^{-1} (r - \mu) \right]\]

where \(n\) is the number of assets, \(\Sigma\) is the estimated covariance matrix (self.covariance_), and \(\mu\) is the estimated mean (self.location_ if available, otherwise zero).

Parameters:
X_testarray-like of shape (n_observations, n_assets)

Observations for which to compute the log-likelihood. Typically held-out test data not used during fitting. Assets with non-finite fitted variance are excluded from inference. This typically happens when the fitted covariance cannot be estimated for an asset, for example before listing, after delisting, or during a warmup period. After this asset-level filtering, each row of X_test is scored using the remaining available values only. This covers row-level missing values in X_test, such as market holidays or pre/post-listing.

yIgnored

Not used, present for scikit-learn API consistency.

Returns:
scorefloat

Mean log-likelihood of the observations. Higher values indicate better fit. The score is averaged over all observations.

Examples

>>> import numpy as np
>>> from skfolio.moments import EmpiricalCovariance, LedoitWolf
>>> X_train = np.random.randn(100, 5)
>>> X_test = np.random.randn(50, 5)
>>> emp = EmpiricalCovariance().fit(X_train)
>>> lw = LedoitWolf().fit(X_train)
>>> # Compare models on held-out data
>>> print(f"Empirical: {emp.score(X_test):.2f}")
>>> print(f"LedoitWolf: {lw.score(X_test):.2f}")
set_fit_request(*, implied_vol='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
implied_volstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for implied_vol parameter in fit.

Returns:
selfobject

The updated object.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

set_score_request(*, X_test='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
X_teststr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for X_test parameter in score.

Returns:
selfobject

The updated object.