skfolio.moments.GerberCovariance#

class skfolio.moments.GerberCovariance(window_size=None, threshold=0.5, psd_variant=True, nearest=True, higham=False, higham_max_iteration=100)[source]#

Gerber Covariance estimator.

Robust co-movement measure which ignores fluctuations below a certain threshold while simultaneously limiting the effects of extreme movements. The Gerber statistic extends Kendall’s Tau by counting the proportion of simultaneous co-movements in series when their amplitudes exceed data-dependent thresholds.

Three variant has been published:

  • Gerber et al. (2015): tend to produce matrices that are non-PSD.

  • Gerber et al. (2019): alteration of the denominator of the above statistic.

  • Gerber et al. (2022): final alteration to ensure PSD matrix.

The last two variants are implemented.

Parameters:
window_sizeint, optional

Window size. The model is fitted on the last window_size observations. The default (None) is to use all the data.

thresholdfloat, default=0.5

Gerber threshold. The default value is 0.5.

psd_variantbool, default=True

If this is set to True, the Gerber et al. (2022) variant is used to ensure a positive semi-definite matrix. Otherwise, the Gerber et al. (2019) variant is used. The default is True.

nearestbool, default=True

If this is set to True, the covariance is replaced by the nearest covariance matrix that is positive definite and with a Cholesky decomposition than can be computed. The variance is left unchanged. A covariance matrix that is not positive definite often occurs in high dimensional problems. It can be due to multicollinearity, floating-point inaccuracies, or when the number of observations is smaller than the number of assets. For more details, see cov_nearest. The default is True.

highambool, default=False

If this is set to True, the Higham (2002) algorithm is used to find the nearest PD covariance, otherwise the eigenvalues are clipped to a threshold above zeros (1e-13). The default is False and uses the clipping method as the Higham algorithm can be slow for large datasets.

higham_max_iterationint, default=100

Maximum number of iterations of the Higham (2002) algorithm. The default value is 100.

Attributes:
covariance_ndarray of shape (n_assets, n_assets)

Estimated covariance.

n_features_in_int

Number of assets seen during fit.

feature_names_in_ndarray of shape (n_features_in_,)

Names of assets seen during fit. Defined only when X has assets names that are all strings.

Methods

fit(X[, y])

Fit the Gerber covariance estimator.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

mahalanobis(X_test)

Compute the squared Mahalanobis distance of observations.

score(X_test[, y])

Compute the mean log-likelihood of observations under the estimated model.

set_params(**params)

Set the parameters of this estimator.

set_score_request(*[, X_test])

Configure whether metadata should be requested to be passed to the score method.

References

[1]

“The gerber statistic: A robust co-movement measure for portfolio optimization”. The Journal of Portfolio Management. Gerber, S., B. Javid, H. Markowitz, P. Sargen, and D. Starer (2022).

[2]

“The gerber statistic: A robust measure of correlation”. Gerber, S., B. Javid, H. Markowitz, P. Sargen, and D. Starer (2019).

[3]

“Enhancing multi-asset portfolio construction under modern portfolio theory with a robust co-movement measure”. Social Science Research network Working Paper Series. Gerber, S., H. Markowitz, and P. Pujara (2015).

[4]

“Deconstructing the Gerber Statistic”. Flint & Polakow, 2023.

fit(X, y=None)[source]#

Fit the Gerber covariance estimator.

Parameters:
Xarray-like of shape (n_observations, n_assets)

Price returns of the assets.

yIgnored

Not used, present for API consistency by convention.

Returns:
selfGerberCovariance

Fitted estimator.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

mahalanobis(X_test)#

Compute the squared Mahalanobis distance of observations.

The squared Mahalanobis distance of an observation \(r\) is defined as:

\[d^2 = (r - \mu)^T \Sigma^{-1} (r - \mu)\]

where \(\Sigma\) is the estimated covariance matrix (self.covariance_) and \(\mu\) is the estimated mean (self.location_ if available, otherwise zero).

This distance measure accounts for correlations between assets and is useful for:

  • Outlier detection in portfolio returns

  • Risk-adjusted distance calculations

  • Identifying unusual market regimes

Parameters:
X_testarray-like of shape (n_observations, n_assets) or (n_assets,)

Observations for which to compute the squared Mahalanobis distance. Each row represents one observation. If 1D, treated as a single observation. Assets with non-finite fitted variance are excluded from inference. Inside the retained inference subspace, the observations must be finite.

Returns:
distancesndarray of shape (n_observations,) or float

Squared Mahalanobis distance for each observation. Returns a scalar if input is 1D.

Examples

>>> import numpy as np
>>> from skfolio.moments import EmpiricalCovariance
>>> X = np.random.randn(100, 3)
>>> model = EmpiricalCovariance()
>>> model.fit(X)
>>> distances = model.mahalanobis(X)
>>> # Distances follow approximately chi-squared distribution with n_assets DoF
>>> print(f"Mean distance: {distances.mean():.2f}, Expected: {3:.2f}")
score(X_test, y=None)#

Compute the mean log-likelihood of observations under the estimated model.

Evaluates how well the fitted covariance matrix explains new observations, assuming a multivariate Gaussian distribution. This is useful for:

  • Model selection (comparing different covariance estimators)

  • Cross-validation of covariance estimation methods

  • Assessing goodness-of-fit

The log-likelihood for a single observation \(r\) is:

\[\log p(r | \mu, \Sigma) = -\frac{1}{2} \left[ n \log(2\pi) + \log|\Sigma| + (r - \mu)^T \Sigma^{-1} (r - \mu) \right]\]

where \(n\) is the number of assets, \(\Sigma\) is the estimated covariance matrix (self.covariance_), and \(\mu\) is the estimated mean (self.location_ if available, otherwise zero).

Parameters:
X_testarray-like of shape (n_observations, n_assets)

Observations for which to compute the log-likelihood. Typically held-out test data not used during fitting. Assets with non-finite fitted variance are excluded from inference. This typically happens when the fitted covariance cannot be estimated for an asset, for example before listing, after delisting, or during a warmup period. After this asset-level filtering, each row of X_test is scored using the remaining available values only. This covers row-level missing values in X_test, such as market holidays or pre/post-listing.

yIgnored

Not used, present for scikit-learn API consistency.

Returns:
scorefloat

Mean log-likelihood of the observations. Higher values indicate better fit. The score is averaged over all observations.

Examples

>>> import numpy as np
>>> from skfolio.moments import EmpiricalCovariance, LedoitWolf
>>> X_train = np.random.randn(100, 5)
>>> X_test = np.random.randn(50, 5)
>>> emp = EmpiricalCovariance().fit(X_train)
>>> lw = LedoitWolf().fit(X_train)
>>> # Compare models on held-out data
>>> print(f"Empirical: {emp.score(X_test):.2f}")
>>> print(f"LedoitWolf: {lw.score(X_test):.2f}")
set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

set_score_request(*, X_test='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
X_teststr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for X_test parameter in score.

Returns:
selfobject

The updated object.