skfolio.moments.GraphicalLassoCV#
- class skfolio.moments.GraphicalLassoCV(alphas=4, n_refinements=4, cv=None, tol=0.0001, enet_tol=0.0001, max_iter=100, mode='cd', n_jobs=None, verbose=False, assume_centered=False, nearest=True, higham=False, higham_max_iteration=100)[source]#
Sparse inverse covariance with cross-validated choice of the l1 penalty.
Read more in scikit-learn.
- Parameters:
- alphasint or array-like of shape (n_alphas,), dtype=float, default=4
If an integer is given, it fixes the number of points on the grids of alpha to be used. If a list is given, it gives the grid to be used. See the notes in the class docstring for more details. Range is [1, inf) for an integer. Range is (0, inf] for an array-like of floats.
- n_refinementsint, default=4
The number of times the grid is refined. Not used if explicit values of alphas are passed. Range is [1, inf).
- cvint, cross-validation generator or iterable, default=None
Determines the cross-validation splitting strategy. Possible inputs for cv are:
None, to use the default 5-fold cross-validation,
integer, to specify the number of folds.
CV splitter,An iterable yielding (train, test) splits as arrays of indices.
For integer/None inputs
KFoldis used.- tolfloat, default=1e-4
The tolerance to declare convergence: if the dual gap goes below this value, iterations are stopped. Range is (0, inf].
- enet_tolfloat, default=1e-4
The tolerance for the elastic net solver used to calculate the descent direction. This parameter controls the accuracy of the search direction for a given column update, not of the overall parameter estimate. Only used for mode=’cd’. Range is (0, inf].
- max_iterint, default=100
Maximum number of iterations.
- mode{‘cd’, ‘lars’}, default=’cd’
The Lasso solver to use: coordinate descent or LARS. Use LARS for very sparse underlying graphs, where number of features is greater than number of samples. Elsewhere prefer cd which is more numerically stable.
- n_jobsint, default=None
Number of jobs to run in parallel.
Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors.- verbosebool, default=False
If verbose is True, the objective function and duality gap are printed at each iteration.
- assume_centeredbool, default=False
If True, data are not centered before computation. Useful when working with data whose mean is almost, but not exactly zero. If False, data are centered before computation.
- Attributes:
- covariance_ndarray of shape (n_assets, n_assets)
Estimated covariance.
- location_ndarray of shape (n_assets,)
Estimated location, i.e. the estimated mean.
- precision_ndarray of shape (n_assets, n_assets)
Estimated pseudo inverse matrix. (stored only if store_precision is True)
- alpha_float
Penalization parameter selected.
- cv_results_dict of ndarrays
A dict with keys:
- alphasndarray of shape (n_alphas,)
All penalization parameters explored.
- split(k)_test_scorendarray of shape (n_alphas,)
Log-likelihood score on left-out data across (k)th fold.
Added in version 1.0.
- mean_test_scorendarray of shape (n_alphas,)
Mean of scores over the folds.
Added in version 1.0.
- std_test_scorendarray of shape (n_alphas,)
Standard deviation of scores over the folds.
Added in version 1.0.
- n_iter_int
Number of iterations run for the optimal alpha.
- n_features_in_int
Number of assets seen during
fit.- feature_names_in_ndarray of shape (
n_features_in_,) Names of features seen during
fit. Defined only whenXhas feature names that are all strings.
Methods
error_norm(comp_cov[, norm, scaling, squared])Compute the Mean Squared Error between two covariance estimators.
fit(X[, y])Fit the GraphicalLasso covariance model to X.
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
Getter for the precision matrix.
mahalanobis(X_test)Compute the squared Mahalanobis distance of observations.
score(X_test[, y])Compute the mean log-likelihood of observations under the estimated model.
set_params(**params)Set the parameters of this estimator.
set_score_request(*[, X_test])Configure whether metadata should be requested to be passed to the
scoremethod.Notes
The search for the optimal penalization parameter (
alpha) is done on an iteratively refined grid: first the cross-validated scores on a grid are computed, then a new refined grid is centered around the maximum, and so on.One of the challenges which is faced here is that the solvers can fail to converge to a well-conditioned estimate. The corresponding values of
alphathen come out as missing values, but the optimum may be close to these missing values.In
fit, once the best parameteralphais found through cross-validation, the model is fit again using the entire training set.- error_norm(comp_cov, norm='frobenius', scaling=True, squared=True)#
Compute the Mean Squared Error between two covariance estimators.
- Parameters:
- comp_covarray-like of shape (n_features, n_features)
The covariance to compare with.
- norm{“frobenius”, “spectral”}, default=”frobenius”
The type of norm used to compute the error. Available error types: - ‘frobenius’ (default): sqrt(tr(A^t.A)) - ‘spectral’: sqrt(max(eigenvalues(A^t.A)) where A is the error
(comp_cov - self.covariance_).- scalingbool, default=True
If True (default), the squared error norm is divided by n_features. If False, the squared error norm is not rescaled.
- squaredbool, default=True
Whether to compute the squared error norm or the error norm. If True (default), the squared error norm is returned. If False, the error norm is returned.
- Returns:
- resultfloat
The Mean Squared Error (in the sense of the Frobenius norm) between
selfandcomp_covcovariance estimators.
- fit(X, y=None, **fit_params)[source]#
Fit the GraphicalLasso covariance model to X.
- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Price returns of the assets.
- yIgnored
Not used, present for API consistency by convention.
- Returns:
- selfGraphicalLassoCV
Fitted estimator.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
Added in version 1.5.
- Returns:
- routingMetadataRouter
A
MetadataRouterencapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- get_precision()#
Getter for the precision matrix.
- Returns:
- precision_array-like of shape (n_features, n_features)
The precision matrix associated to the current covariance object.
- mahalanobis(X_test)#
Compute the squared Mahalanobis distance of observations.
The squared Mahalanobis distance of an observation \(r\) is defined as:
\[d^2 = (r - \mu)^T \Sigma^{-1} (r - \mu)\]where \(\Sigma\) is the estimated covariance matrix (
self.covariance_) and \(\mu\) is the estimated mean (self.location_if available, otherwise zero).This distance measure accounts for correlations between assets and is useful for:
Outlier detection in portfolio returns
Risk-adjusted distance calculations
Identifying unusual market regimes
- Parameters:
- X_testarray-like of shape (n_observations, n_assets) or (n_assets,)
Observations for which to compute the squared Mahalanobis distance. Each row represents one observation. If 1D, treated as a single observation. Assets with non-finite fitted variance are excluded from inference. After this asset-level filtering, each row is evaluated using the remaining available values only, covering row-level missing values such as market holidays or pre/post-listing. When rows have different observation patterns, the returned distances follow \(\chi^2\) distributions with different degrees of freedom. Rows with no finite retained observation return NaN.
- Returns:
- distancesndarray of shape (n_observations,) or float
Squared Mahalanobis distance for each observation. Returns a scalar if input is 1D.
Examples
>>> import numpy as np >>> from skfolio.moments import EmpiricalCovariance >>> X = np.random.randn(100, 3) >>> model = EmpiricalCovariance() >>> model.fit(X) >>> distances = model.mahalanobis(X) >>> # Distances follow approximately chi-squared distribution with n_assets DoF >>> print(f"Mean distance: {distances.mean():.2f}, Expected: {3:.2f}")
- score(X_test, y=None)#
Compute the mean log-likelihood of observations under the estimated model.
Evaluates how well the fitted covariance matrix explains new observations, assuming a multivariate Gaussian distribution. This is useful for:
Model selection (comparing different covariance estimators)
Cross-validation of covariance estimation methods
Assessing goodness-of-fit
The log-likelihood for a single observation \(r\) is:
\[\log p(r | \mu, \Sigma) = -\frac{1}{2} \left[ n \log(2\pi) + \log|\Sigma| + (r - \mu)^T \Sigma^{-1} (r - \mu) \right]\]where \(n\) is the number of assets, \(\Sigma\) is the estimated covariance matrix (
self.covariance_), and \(\mu\) is the estimated mean (self.location_if available, otherwise zero).- Parameters:
- X_testarray-like of shape (n_observations, n_assets)
Observations for which to compute the log-likelihood. Typically held-out test data not used during fitting. Assets with non-finite fitted variance are excluded from inference. This typically happens when the fitted covariance cannot be estimated for an asset, for example before listing, after delisting, or during a warmup period. After this asset-level filtering, each row of
X_testis scored using the remaining available values only. This covers row-level missing values inX_test, such as market holidays or pre/post-listing.- yIgnored
Not used, present for scikit-learn API consistency.
- Returns:
- scorefloat
Mean log-likelihood of the observations. Higher values indicate better fit. The score is averaged over all observations.
Examples
>>> import numpy as np >>> from skfolio.moments import EmpiricalCovariance, LedoitWolf >>> X_train = np.random.randn(100, 5) >>> X_test = np.random.randn(50, 5) >>> emp = EmpiricalCovariance().fit(X_train) >>> lw = LedoitWolf().fit(X_train) >>> # Compare models on held-out data >>> print(f"Empirical: {emp.score(X_test):.2f}") >>> print(f"LedoitWolf: {lw.score(X_test):.2f}")
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_score_request(*, X_test='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- X_teststr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
X_testparameter inscore.
- Returns:
- selfobject
The updated object.