skfolio.moments.GerberCovariance#
- class skfolio.moments.GerberCovariance(window_size=None, threshold=0.5, psd_variant=True, nearest=True, higham=False, higham_max_iteration=100)[source]#
Gerber Covariance estimator.
Robust co-movement measure which ignores fluctuations below a certain threshold while simultaneously limiting the effects of extreme movements. The Gerber statistic extends Kendall’s Tau by counting the proportion of simultaneous co-movements in series when their amplitudes exceed data-dependent thresholds.
Three variant has been published:
Gerber et al. (2015): tend to produce matrices that are non-PSD.
Gerber et al. (2019): alteration of the denominator of the above statistic.
Gerber et al. (2022): final alteration to ensure PSD matrix.
The last two variants are implemented.
- Parameters:
- window_sizeint, optional
Window size. The model is fitted on the last
window_sizeobservations. The default (None) is to use all the data.- thresholdfloat, default=0.5
Gerber threshold. The default value is
0.5.- psd_variantbool, default=True
If this is set to True, the Gerber et al. (2022) variant is used to ensure a positive semi-definite matrix. Otherwise, the Gerber et al. (2019) variant is used. The default is
True.- nearestbool, default=True
If this is set to True, the covariance is replaced by the nearest covariance matrix that is positive definite and with a Cholesky decomposition than can be computed. The variance is left unchanged. A covariance matrix that is not positive definite often occurs in high dimensional problems. It can be due to multicollinearity, floating-point inaccuracies, or when the number of observations is smaller than the number of assets. For more details, see
cov_nearest. The default isTrue.- highambool, default=False
If this is set to True, the Higham (2002) algorithm is used to find the nearest PD covariance, otherwise the eigenvalues are clipped to a threshold above zeros (1e-13). The default is
Falseand uses the clipping method as the Higham algorithm can be slow for large datasets.- higham_max_iterationint, default=100
Maximum number of iterations of the Higham (2002) algorithm. The default value is
100.
- Attributes:
- covariance_ndarray of shape (n_assets, n_assets)
Estimated covariance.
- n_features_in_int
Number of assets seen during
fit.- feature_names_in_ndarray of shape (
n_features_in_,) Names of assets seen during
fit. Defined only whenXhas assets names that are all strings.
Methods
fit(X[, y])Fit the Gerber covariance estimator.
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
mahalanobis(X_test)Compute the squared Mahalanobis distance of observations.
score(X_test[, y])Compute the mean log-likelihood of observations under the estimated model.
set_params(**params)Set the parameters of this estimator.
set_score_request(*[, X_test])Configure whether metadata should be requested to be passed to the
scoremethod.References
[1]“The gerber statistic: A robust co-movement measure for portfolio optimization”. The Journal of Portfolio Management. Gerber, S., B. Javid, H. Markowitz, P. Sargen, and D. Starer (2022).
[2]“The gerber statistic: A robust measure of correlation”. Gerber, S., B. Javid, H. Markowitz, P. Sargen, and D. Starer (2019).
[3]“Enhancing multi-asset portfolio construction under modern portfolio theory with a robust co-movement measure”. Social Science Research network Working Paper Series. Gerber, S., H. Markowitz, and P. Pujara (2015).
[4]“Deconstructing the Gerber Statistic”. Flint & Polakow, 2023.
- fit(X, y=None)[source]#
Fit the Gerber covariance estimator.
- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Price returns of the assets.
- yIgnored
Not used, present for API consistency by convention.
- Returns:
- selfGerberCovariance
Fitted estimator.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- mahalanobis(X_test)#
Compute the squared Mahalanobis distance of observations.
The squared Mahalanobis distance of an observation \(r\) is defined as:
\[d^2 = (r - \mu)^T \Sigma^{-1} (r - \mu)\]where \(\Sigma\) is the estimated covariance matrix (
self.covariance_) and \(\mu\) is the estimated mean (self.location_if available, otherwise zero).This distance measure accounts for correlations between assets and is useful for:
Outlier detection in portfolio returns
Risk-adjusted distance calculations
Identifying unusual market regimes
- Parameters:
- X_testarray-like of shape (n_observations, n_assets) or (n_assets,)
Observations for which to compute the squared Mahalanobis distance. Each row represents one observation. If 1D, treated as a single observation. Assets with non-finite fitted variance are excluded from inference. Inside the retained inference subspace, the observations must be finite.
- Returns:
- distancesndarray of shape (n_observations,) or float
Squared Mahalanobis distance for each observation. Returns a scalar if input is 1D.
Examples
>>> import numpy as np >>> from skfolio.moments import EmpiricalCovariance >>> X = np.random.randn(100, 3) >>> model = EmpiricalCovariance() >>> model.fit(X) >>> distances = model.mahalanobis(X) >>> # Distances follow approximately chi-squared distribution with n_assets DoF >>> print(f"Mean distance: {distances.mean():.2f}, Expected: {3:.2f}")
- score(X_test, y=None)#
Compute the mean log-likelihood of observations under the estimated model.
Evaluates how well the fitted covariance matrix explains new observations, assuming a multivariate Gaussian distribution. This is useful for:
Model selection (comparing different covariance estimators)
Cross-validation of covariance estimation methods
Assessing goodness-of-fit
The log-likelihood for a single observation \(r\) is:
\[\log p(r | \mu, \Sigma) = -\frac{1}{2} \left[ n \log(2\pi) + \log|\Sigma| + (r - \mu)^T \Sigma^{-1} (r - \mu) \right]\]where \(n\) is the number of assets, \(\Sigma\) is the estimated covariance matrix (
self.covariance_), and \(\mu\) is the estimated mean (self.location_if available, otherwise zero).- Parameters:
- X_testarray-like of shape (n_observations, n_assets)
Observations for which to compute the log-likelihood. Typically held-out test data not used during fitting. Assets with non-finite fitted variance are excluded from inference. This typically happens when the fitted covariance cannot be estimated for an asset, for example before listing, after delisting, or during a warmup period. After this asset-level filtering, each row of
X_testis scored using the remaining available values only. This covers row-level missing values inX_test, such as market holidays or pre/post-listing.- yIgnored
Not used, present for scikit-learn API consistency.
- Returns:
- scorefloat
Mean log-likelihood of the observations. Higher values indicate better fit. The score is averaged over all observations.
Examples
>>> import numpy as np >>> from skfolio.moments import EmpiricalCovariance, LedoitWolf >>> X_train = np.random.randn(100, 5) >>> X_test = np.random.randn(50, 5) >>> emp = EmpiricalCovariance().fit(X_train) >>> lw = LedoitWolf().fit(X_train) >>> # Compare models on held-out data >>> print(f"Empirical: {emp.score(X_test):.2f}") >>> print(f"LedoitWolf: {lw.score(X_test):.2f}")
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_score_request(*, X_test='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- X_teststr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
X_testparameter inscore.
- Returns:
- selfobject
The updated object.