skfolio.moments.DetoneCovariance#
- class skfolio.moments.DetoneCovariance(covariance_estimator=None, n_markets=1, nearest=True, higham=False, higham_max_iteration=100)[source]#
Covariance Detoning estimator.
Financial covariance matrices usually incorporate a market component corresponding to the first eigenvectors [1]. For some applications like clustering, removing the market component (loud tone) allow a greater portion of the covariance to be explained by components that affect specific subsets of the securities.
- Parameters:
- covariance_estimatorBaseCovariance, optional
Covariance estimator to estimate the covariance matrix prior detoning. The default (
None) is to useEmpiricalCovariance.- n_marketsint, default=1
Number of eigenvectors related to the market. The default value is
1.- nearestbool, default=True
If this is set to True, the covariance is replaced by the nearest covariance matrix that is positive definite and with a Cholesky decomposition than can be computed. The variance is left unchanged. A covariance matrix that is not positive definite often occurs in high dimensional problems. It can be due to multicollinearity, floating-point inaccuracies, or when the number of observations is smaller than the number of assets. For more details, see
cov_nearest. The default isTrue.- highambool, default=False
If this is set to True, the Higham (2002) algorithm is used to find the nearest PD covariance, otherwise the eigenvalues are clipped to a threshold above zeros (1e-13). The default is
Falseand uses the clipping method as the Higham algorithm can be slow for large datasets.- higham_max_iterationint, default=100
Maximum number of iterations of the Higham (2002) algorithm. The default value is
100.
- Attributes:
- covariance_ndarray of shape (n_assets, n_assets)
Estimated covariance.
- covariance_estimator_BaseCovariance
Fitted
covariance_estimator.- n_features_in_int
Number of assets seen during
fit.- feature_names_in_ndarray of shape (
n_features_in_,) Names of assets seen during
fit. Defined only whenXhas assets names that are all strings.
Methods
fit(X[, y])Fit the Covariance Detoning estimator.
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
mahalanobis(X_test)Compute the squared Mahalanobis distance of observations.
score(X_test[, y])Compute the mean log-likelihood of observations under the estimated model.
set_params(**params)Set the parameters of this estimator.
set_score_request(*[, X_test])Configure whether metadata should be requested to be passed to the
scoremethod.References
[1]“Machine Learning for Asset Managers”. Elements in Quantitative Finance. Lòpez de Prado (2020).
- fit(X, y=None, **fit_params)[source]#
Fit the Covariance Detoning estimator.
- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Price returns of the assets.
- yIgnored
Not used, present for API consistency by convention.
- **fit_paramsdict
Parameters to pass to the underlying estimators. Only available if
enable_metadata_routing=True, which can be set by usingsklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.
- Returns:
- selfDetoneCovariance
Fitted estimator.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- mahalanobis(X_test)#
Compute the squared Mahalanobis distance of observations.
The squared Mahalanobis distance of an observation \(r\) is defined as:
\[d^2 = (r - \mu)^T \Sigma^{-1} (r - \mu)\]where \(\Sigma\) is the estimated covariance matrix (
self.covariance_) and \(\mu\) is the estimated mean (self.location_if available, otherwise zero).This distance measure accounts for correlations between assets and is useful for:
Outlier detection in portfolio returns
Risk-adjusted distance calculations
Identifying unusual market regimes
- Parameters:
- X_testarray-like of shape (n_observations, n_assets) or (n_assets,)
Observations for which to compute the squared Mahalanobis distance. Each row represents one observation. If 1D, treated as a single observation. Assets with non-finite fitted variance are excluded from inference. After this asset-level filtering, each row is evaluated using the remaining available values only, covering row-level missing values such as market holidays or pre/post-listing. When rows have different observation patterns, the returned distances follow \(\chi^2\) distributions with different degrees of freedom. Rows with no finite retained observation return NaN.
- Returns:
- distancesndarray of shape (n_observations,) or float
Squared Mahalanobis distance for each observation. Returns a scalar if input is 1D.
Examples
>>> import numpy as np >>> from skfolio.moments import EmpiricalCovariance >>> X = np.random.randn(100, 3) >>> model = EmpiricalCovariance() >>> model.fit(X) >>> distances = model.mahalanobis(X) >>> # Distances follow approximately chi-squared distribution with n_assets DoF >>> print(f"Mean distance: {distances.mean():.2f}, Expected: {3:.2f}")
- score(X_test, y=None)#
Compute the mean log-likelihood of observations under the estimated model.
Evaluates how well the fitted covariance matrix explains new observations, assuming a multivariate Gaussian distribution. This is useful for:
Model selection (comparing different covariance estimators)
Cross-validation of covariance estimation methods
Assessing goodness-of-fit
The log-likelihood for a single observation \(r\) is:
\[\log p(r | \mu, \Sigma) = -\frac{1}{2} \left[ n \log(2\pi) + \log|\Sigma| + (r - \mu)^T \Sigma^{-1} (r - \mu) \right]\]where \(n\) is the number of assets, \(\Sigma\) is the estimated covariance matrix (
self.covariance_), and \(\mu\) is the estimated mean (self.location_if available, otherwise zero).- Parameters:
- X_testarray-like of shape (n_observations, n_assets)
Observations for which to compute the log-likelihood. Typically held-out test data not used during fitting. Assets with non-finite fitted variance are excluded from inference. This typically happens when the fitted covariance cannot be estimated for an asset, for example before listing, after delisting, or during a warmup period. After this asset-level filtering, each row of
X_testis scored using the remaining available values only. This covers row-level missing values inX_test, such as market holidays or pre/post-listing.- yIgnored
Not used, present for scikit-learn API consistency.
- Returns:
- scorefloat
Mean log-likelihood of the observations. Higher values indicate better fit. The score is averaged over all observations.
Examples
>>> import numpy as np >>> from skfolio.moments import EmpiricalCovariance, LedoitWolf >>> X_train = np.random.randn(100, 5) >>> X_test = np.random.randn(50, 5) >>> emp = EmpiricalCovariance().fit(X_train) >>> lw = LedoitWolf().fit(X_train) >>> # Compare models on held-out data >>> print(f"Empirical: {emp.score(X_test):.2f}") >>> print(f"LedoitWolf: {lw.score(X_test):.2f}")
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_score_request(*, X_test='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- X_teststr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
X_testparameter inscore.
- Returns:
- selfobject
The updated object.