skfolio.moments.OAS#

class skfolio.moments.OAS(store_precision=True, assume_centered=False, nearest=True, higham=False, higham_max_iteration=100)[source]#

Oracle Approximating Shrinkage Estimator as proposed in [1].

Read more in scikit-learn.

Parameters:
store_precisionbool, default=True

Specify if the estimated precision is stored.

assume_centeredbool, default=False

If True, data will not be centered before computation. Useful when working with data whose mean is almost, but not exactly zero. If False (default), data will be centered before computation.

Attributes:
covariance_ndarray of shape (n_assets, n_assets)

Estimated covariance.

location_ndarray of shape (n_assets,)

Estimated location, i.e. the estimated mean.

precision_ndarray of shape (n_assets, n_assets)

Estimated pseudo inverse matrix. (stored only if store_precision is True)

shrinkage_float

Coefficient in the convex combination used for the computation of the shrunk estimate. Range is [0, 1].

n_features_in_int

Number of assets seen during fit.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during fit. Defined only when X has feature names that are all strings.

Methods

error_norm(comp_cov[, norm, scaling, squared])

Compute the Mean Squared Error between two covariance estimators.

fit(X[, y])

Fit the Oracle Approximating Shrinkage covariance model to X.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

get_precision()

Getter for the precision matrix.

mahalanobis(X_test)

Compute the squared Mahalanobis distance of observations.

score(X_test[, y])

Compute the mean log-likelihood of observations under the estimated model.

set_params(**params)

Set the parameters of this estimator.

set_score_request(*[, X_test])

Configure whether metadata should be requested to be passed to the score method.

Notes

The regularised covariance is:

(1 - shrinkage) * cov + shrinkage * mu * np.identity(n_features),

where mu = trace(cov) / n_features and shrinkage is given by the OAS formula (see [1]).

The shrinkage formulation implemented here differs from Eq. 23 in [1]. In the original article, formula (23) states that 2/p (p being the number of features) is multiplied by Trace(cov*cov) in both the numerator and denominator, but this operation is omitted because for a large p, the value of 2/p is so small that it doesn’t affect the value of the estimator.

References

[1] (1,2,3)

“Shrinkage algorithms for MMSE covariance estimation”. Chen, Y., Wiesel, A., Eldar, Y. C., & Hero, A. O. IEEE Transactions on Signal Processing, 58(10), 5016-5029, 2010.

error_norm(comp_cov, norm='frobenius', scaling=True, squared=True)#

Compute the Mean Squared Error between two covariance estimators.

Parameters:
comp_covarray-like of shape (n_features, n_features)

The covariance to compare with.

norm{“frobenius”, “spectral”}, default=”frobenius”

The type of norm used to compute the error. Available error types: - ‘frobenius’ (default): sqrt(tr(A^t.A)) - ‘spectral’: sqrt(max(eigenvalues(A^t.A)) where A is the error (comp_cov - self.covariance_).

scalingbool, default=True

If True (default), the squared error norm is divided by n_features. If False, the squared error norm is not rescaled.

squaredbool, default=True

Whether to compute the squared error norm or the error norm. If True (default), the squared error norm is returned. If False, the error norm is returned.

Returns:
resultfloat

The Mean Squared Error (in the sense of the Frobenius norm) between self and comp_cov covariance estimators.

fit(X, y=None)[source]#

Fit the Oracle Approximating Shrinkage covariance model to X.

Parameters:
Xarray-like of shape (n_observations, n_assets)

Price returns of the assets.

yIgnored

Not used, present for API consistency by convention.

Returns:
selfOAS

Fitted estimator.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

get_precision()#

Getter for the precision matrix.

Returns:
precision_array-like of shape (n_features, n_features)

The precision matrix associated to the current covariance object.

mahalanobis(X_test)#

Compute the squared Mahalanobis distance of observations.

The squared Mahalanobis distance of an observation \(r\) is defined as:

\[d^2 = (r - \mu)^T \Sigma^{-1} (r - \mu)\]

where \(\Sigma\) is the estimated covariance matrix (self.covariance_) and \(\mu\) is the estimated mean (self.location_ if available, otherwise zero).

This distance measure accounts for correlations between assets and is useful for:

  • Outlier detection in portfolio returns

  • Risk-adjusted distance calculations

  • Identifying unusual market regimes

Parameters:
X_testarray-like of shape (n_observations, n_assets) or (n_assets,)

Observations for which to compute the squared Mahalanobis distance. Each row represents one observation. If 1D, treated as a single observation. Assets with non-finite fitted variance are excluded from inference. After this asset-level filtering, each row is evaluated using the remaining available values only, covering row-level missing values such as market holidays or pre/post-listing. When rows have different observation patterns, the returned distances follow \(\chi^2\) distributions with different degrees of freedom. Rows with no finite retained observation return NaN.

Returns:
distancesndarray of shape (n_observations,) or float

Squared Mahalanobis distance for each observation. Returns a scalar if input is 1D.

Examples

>>> import numpy as np
>>> from skfolio.moments import EmpiricalCovariance
>>> X = np.random.randn(100, 3)
>>> model = EmpiricalCovariance()
>>> model.fit(X)
>>> distances = model.mahalanobis(X)
>>> # Distances follow approximately chi-squared distribution with n_assets DoF
>>> print(f"Mean distance: {distances.mean():.2f}, Expected: {3:.2f}")
score(X_test, y=None)#

Compute the mean log-likelihood of observations under the estimated model.

Evaluates how well the fitted covariance matrix explains new observations, assuming a multivariate Gaussian distribution. This is useful for:

  • Model selection (comparing different covariance estimators)

  • Cross-validation of covariance estimation methods

  • Assessing goodness-of-fit

The log-likelihood for a single observation \(r\) is:

\[\log p(r | \mu, \Sigma) = -\frac{1}{2} \left[ n \log(2\pi) + \log|\Sigma| + (r - \mu)^T \Sigma^{-1} (r - \mu) \right]\]

where \(n\) is the number of assets, \(\Sigma\) is the estimated covariance matrix (self.covariance_), and \(\mu\) is the estimated mean (self.location_ if available, otherwise zero).

Parameters:
X_testarray-like of shape (n_observations, n_assets)

Observations for which to compute the log-likelihood. Typically held-out test data not used during fitting. Assets with non-finite fitted variance are excluded from inference. This typically happens when the fitted covariance cannot be estimated for an asset, for example before listing, after delisting, or during a warmup period. After this asset-level filtering, each row of X_test is scored using the remaining available values only. This covers row-level missing values in X_test, such as market holidays or pre/post-listing.

yIgnored

Not used, present for scikit-learn API consistency.

Returns:
scorefloat

Mean log-likelihood of the observations. Higher values indicate better fit. The score is averaged over all observations.

Examples

>>> import numpy as np
>>> from skfolio.moments import EmpiricalCovariance, LedoitWolf
>>> X_train = np.random.randn(100, 5)
>>> X_test = np.random.randn(50, 5)
>>> emp = EmpiricalCovariance().fit(X_train)
>>> lw = LedoitWolf().fit(X_train)
>>> # Compare models on held-out data
>>> print(f"Empirical: {emp.score(X_test):.2f}")
>>> print(f"LedoitWolf: {lw.score(X_test):.2f}")
set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

set_score_request(*, X_test='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
X_teststr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for X_test parameter in score.

Returns:
selfobject

The updated object.