skfolio.moments.LedoitWolf#

class skfolio.moments.LedoitWolf(store_precision=True, assume_centered=False, block_size=1000, nearest=True, higham=False, higham_max_iteration=100)[source]#

LedoitWolf Covariance Estimator.

Ledoit-Wolf is a particular form of shrinkage, where the shrinkage coefficient is computed using O. Ledoit and M. Wolf’s formula as described in [1].

Read more in scikit-learn.

Parameters:

store_precisionbool, default=True: Specify if the estimated precision is stored.
assume_centeredbool, default=False: If True, data will not be centered before computation. Useful when working with data whose mean is almost, but not exactly zero. If False (default), data will be centered before computation.
block_sizeint, default=1000: Size of blocks into which the covariance matrix will be split during its Ledoit-Wolf estimation. This is purely a memory optimization and does not affect results.
nearestbool, default=True: If this is set to True, the covariance is replaced by the nearest covariance matrix that is positive definite and with a Cholesky decomposition than can be computed. The variance is left unchanged. A covariance matrix that is not positive definite often occurs in high dimensional problems. It can be due to multicollinearity, floating-point inaccuracies, or when the number of observations is smaller than the number of assets. For more details, see cov_nearest. The default is True.
highambool, default=False: If this is set to True, the Higham (2002) algorithm is used to find the nearest PD covariance, otherwise the eigenvalues are clipped to a threshold above zeros (1e-13). The default is False and uses the clipping method as the Higham algorithm can be slow for large datasets.
higham_max_iterationint, default=100: Maximum number of iterations of the Higham (2002) algorithm. The default value is 100.

Attributes:

covariance_ndarray of shape (n_assets, n_assets): Estimated covariance.
location_ndarray of shape (n_assets,): Estimated location, i.e. the estimated mean.
precision_ndarray of shape (n_assets, n_assets): Estimated pseudo inverse matrix. (stored only if store_precision is True)
shrinkage_float: Coefficient in the convex combination used for the computation of the shrunk estimate. Range is [0, 1].
n_features_in_int: Number of assets seen during fit.
feature_names_in_ndarray of shape (n_features_in_,): Names of features seen during fit. Defined only when X has feature names that are all strings.

Methods

`error_norm`(comp_cov[, norm, scaling, squared])	Compute the Mean Squared Error between two covariance estimators.
`fit`(X[, y])	Fit the Ledoit-Wolf shrunk covariance model to X.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`get_precision`()	Getter for the precision matrix.
`mahalanobis`(X_test)	Compute the squared Mahalanobis distance of observations.
`score`(X_test[, y])	Compute the mean log-likelihood of observations under the estimated model.
`set_params`(**params)	Set the parameters of this estimator.
`set_score_request`(*[, X_test])	Configure whether metadata should be requested to be passed to the `score` method.

Notes

The regularised covariance is:

(1 - shrinkage) * cov + shrinkage * mu * np.identity(n_features)

where mu = trace(cov) / n_features and shrinkage is given by the Ledoit and Wolf formula (see References)

References

[1]

“A Well-Conditioned Estimator for Large-Dimensional Covariance Matrices”. Ledoit and Wolf, Journal of Multivariate Analysis, Volume 88, Issue 2. February 2004, pages 365-41.

error_norm(comp_cov, norm='frobenius', scaling=True, squared=True)#

Compute the Mean Squared Error between two covariance estimators.

Parameters:

comp_covarray-like of shape (n_features, n_features): The covariance to compare with.
norm{“frobenius”, “spectral”}, default=”frobenius”: The type of norm used to compute the error. Available error types: - ‘frobenius’ (default): sqrt(tr(A^t.A)) - ‘spectral’: sqrt(max(eigenvalues(A^t.A)) where A is the error (comp_cov - self.covariance_).
scalingbool, default=True: If True (default), the squared error norm is divided by n_features. If False, the squared error norm is not rescaled.
squaredbool, default=True: Whether to compute the squared error norm or the error norm. If True (default), the squared error norm is returned. If False, the error norm is returned.

Returns:

resultfloat: The Mean Squared Error (in the sense of the Frobenius norm) between self and comp_cov covariance estimators.

fit(X, y=None, **fit_params)[source]#

Fit the Ledoit-Wolf shrunk covariance model to X.

Parameters:

Xarray-like of shape (n_observations, n_assets): Price returns of the assets.
yIgnored: Not used, present for API consistency by convention.

Returns:

selfLedoitWolf: Fitted estimator.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

get_precision()#

Getter for the precision matrix.

Returns:

precision_array-like of shape (n_features, n_features): The precision matrix associated to the current covariance object.

mahalanobis(X_test)#

Compute the squared Mahalanobis distance of observations.

The squared Mahalanobis distance of an observation $r$ is defined as:

\[d^2 = (r - \mu)^T \Sigma^{-1} (r - \mu)\]

where $\Sigma$ is the estimated covariance matrix (self.covariance_) and $\mu$ is the estimated mean (self.location_ if available, otherwise zero).

This distance measure accounts for correlations between assets and is useful for:

Outlier detection in portfolio returns
Risk-adjusted distance calculations
Identifying unusual market regimes

Parameters:

X_testarray-like of shape (n_observations, n_assets) or (n_assets,): Observations for which to compute the squared Mahalanobis distance. Each row represents one observation. If 1D, treated as a single observation. Assets with non-finite fitted variance are excluded from inference. After this asset-level filtering, each row is evaluated using the remaining available values only, covering row-level missing values such as market holidays or pre/post-listing. When rows have different observation patterns, the returned distances follow $\chi^2$ distributions with different degrees of freedom. Rows with no finite retained observation return NaN.

Returns:

distancesndarray of shape (n_observations,) or float: Squared Mahalanobis distance for each observation. Returns a scalar if input is 1D.

Examples

>>> import numpy as np
>>> from skfolio.moments import EmpiricalCovariance
>>> X = np.random.randn(100, 3)
>>> model = EmpiricalCovariance()
>>> model.fit(X)
>>> distances = model.mahalanobis(X)
>>> # Distances follow approximately chi-squared distribution with n_assets DoF
>>> print(f"Mean distance: {distances.mean():.2f}, Expected: {3:.2f}")

score(X_test, y=None)#

Compute the mean log-likelihood of observations under the estimated model.

Evaluates how well the fitted covariance matrix explains new observations, assuming a multivariate Gaussian distribution. This is useful for:

Model selection (comparing different covariance estimators)
Cross-validation of covariance estimation methods
Assessing goodness-of-fit

The log-likelihood for a single observation $r$ is:

\[\log p(r | \mu, \Sigma) = -\frac{1}{2} \left[ n \log(2\pi) + \log|\Sigma| + (r - \mu)^T \Sigma^{-1} (r - \mu) \right]\]

where $n$ is the number of assets, $\Sigma$ is the estimated covariance matrix (self.covariance_), and $\mu$ is the estimated mean (self.location_ if available, otherwise zero).

Parameters:

X_testarray-like of shape (n_observations, n_assets): Observations for which to compute the log-likelihood. Typically held-out test data not used during fitting. Assets with non-finite fitted variance are excluded from inference. This typically happens when the fitted covariance cannot be estimated for an asset, for example before listing, after delisting, or during a warmup period. After this asset-level filtering, each row of X_test is scored using the remaining available values only. This covers row-level missing values in X_test, such as market holidays or pre/post-listing.
yIgnored: Not used, present for scikit-learn API consistency.

Returns:

scorefloat: Mean log-likelihood of the observations. Higher values indicate better fit. The score is averaged over all observations.

Examples

>>> import numpy as np
>>> from skfolio.moments import EmpiricalCovariance, LedoitWolf
>>> X_train = np.random.randn(100, 5)
>>> X_test = np.random.randn(50, 5)
>>> emp = EmpiricalCovariance().fit(X_train)
>>> lw = LedoitWolf().fit(X_train)
>>> # Compare models on held-out data
>>> print(f"Empirical: {emp.score(X_test):.2f}")
>>> print(f"LedoitWolf: {lw.score(X_test):.2f}")

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

set_score_request(*, X_test='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

X_teststr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for X_test parameter in score.

Returns:

selfobject: The updated object.