skfolio.model_selection.covariance_forecast_evaluation#

skfolio.model_selection.covariance_forecast_evaluation(estimator, X, y=None, train_size=252, test_size=1, expand_train=False, portfolio_weights=None, purged_size=0, params=None)[source]#

Evaluate out-of-sample covariance forecast quality using walk-forward cross-validation.

At each fold the estimator is fitted from scratch on the training window and the fitted covariance is evaluated against the next test_size observations. This is the batch counterpart of online_covariance_forecast_evaluation, which instead updates the estimator incrementally via partial_fit.

The walk-forward scheme is controlled by train_size and expand_train, mirroring the semantics of WalkForward:

expand_train=False (default): rolling window of fixed train_size.
expand_train=True: expanding window starting from the first train_size observations.

Every evaluation window contains exactly test_size observations, ensuring that diagnostics (in particular QLIKE) are directly comparable across folds.

Four core diagnostics are computed:

Mahalanobis calibration ratio: tests whether the full covariance structure (all eigenvalue directions) is correctly specified. The target is 1.0. A value above 1.0 indicates underestimated risk; below 1.0 indicates overestimated risk.
Diagonal calibration ratio: tests whether the individual asset variances are correctly specified, ignoring correlations. The target is 1.0. A value above 1.0 indicates underestimated volatilities; below 1.0 indicates overestimated volatilities.
Portfolio standardized returns / bias statistic: tests whether the covariance is well calibrated along one or more portfolio directions.
Portfolio QLIKE: evaluates portfolio variance forecasts along one or more portfolio directions by comparing the forecast portfolio variance with the realized sum of squared portfolio returns over the evaluation window. Lower values indicate better portfolio variance forecasts.

When the test returns contain NaNs (e.g. holidays, pre-listing, or post-delisting periods), only finite observations contribute to the aggregated return. For portfolio diagnostics, NaN returns for active assets contribute zero to the realized portfolio return and the forecast covariance is scaled by the pairwise observation count matrix \(H\) (Hadamard product \(H \odot \Sigma\)) so that the realized portfolio variance and forecast variance follow the same missing-data convention. In skfolio, NaN diagonal entries in the forecast covariance mark inactive assets, which are excluded from the evaluation.

Parameters:

estimatorBaseEstimator or Pipeline

Fitted estimator or Pipeline. Must expose covariance_ or return_distribution_.covariance after fitting.

Xarray-like of shape (n_observations, n_assets)

Asset returns.

yIgnored

Present for scikit-learn API compatibility.

train_sizeint, default=252

Number of observations in each training window (rolling or initial expanding window size).

test_sizeint, default=1

Number of observations per evaluation window. All windows have exactly this many observations.

expand_trainbool, default=False

If True, each subsequent training window includes all past observations (expanding window). If False, a rolling window of fixed train_size is used.

portfolio_weightsarray-like of shape (n_assets,) or (n_portfolios, n_assets), optional

Portfolio weights for portfolio-level diagnostics (bias statistic and QLIKE).

If None (default), inverse-volatility weights are used, recomputed dynamically at each step from the forecast covariance. This neutralizes volatility dispersion so that high-volatility assets do not dominate the diagnostic.

If a 1D array is provided, a single static portfolio is used.

If a 2D array of shape (n_portfolios, n_assets) is provided, each row defines a test portfolio and diagnostics are computed independently for each.

For equal-weight calibration, pass portfolio_weights=np.ones(n_assets) / n_assets.

purged_sizeint, default=0

Number of observations to skip between training and test data.

paramsdict, optional

Parameters routed to the estimator’s fit via metadata routing.

Returns:

evaluationCovarianceForecastEvaluation: Frozen dataclass with per-step calibration arrays, summary statistics, and plotting methods.

Raises:

ValueError: If the data is too short for at least one evaluation fold.