skfolio.model_selection.CovarianceForecastEvaluation#

class skfolio.model_selection.CovarianceForecastEvaluation(observations, horizon, squared_mahalanobis_distance, mahalanobis_calibration_ratio, diagonal_calibration_ratio, portfolio_standardized_return, portfolio_variance_qlike_loss, n_valid_assets, n_portfolios, name=None)[source]#

Out-of-sample covariance forecast evaluation.

Stores per-step calibration diagnostics produced by covariance_forecast_evaluation or online_covariance_forecast_evaluation and provides summary statistics and plots.

The four core diagnostics are:

Mahalanobis calibration ratio: tests whether the full covariance structure (all eigenvalue directions) is correctly specified. At each step, let \(r_t\) be the one-period realized return vector and let \(R^{(h)}\) be the aggregated return over the evaluation window of \(h\) observations. The squared Mahalanobis distance \(d^2 = {R^{(h)}}^\top(h\,\Sigma)^{-1}R^{(h)}\) yields the calibration ratio \(d^2 / n\), where \(n\) is the number of active assets. The target is 1.0. A value above 1.0 indicates underestimated risk; below 1.0 indicates overestimated risk.
Diagonal calibration ratio: tests whether the individual asset variances are correctly specified, ignoring correlations. Computed as \(\frac{1}{n}\sum_i (R_i^{(h)})^2 / (h_i\,\sigma_i^2)\) where \(h_i\) is the number of finite returns for asset \(i\) in the evaluation window. The target is 1.0. A value above 1.0 indicates underestimated volatilities; below 1.0 indicates overestimated volatilities.
Portfolio standardized returns: tests whether the covariance is well calibrated along one or more portfolio directions rather than across all directions. For a portfolio with weights \(w\), the realized portfolio return is standardized by the matching forecast portfolio volatility: \(b = r_p / \hat\sigma_p\) with \(r_p = w^\top R^{(h)}\) and \(\hat\sigma_p^{2} = w^\top(h\,\Sigma)w\). Under correct calibration \(b_t\) has mean 0 and standard deviation 1. The bias statistic \(B = \mathrm{std}(b_t)\) summarizes forecast quality: \(B \approx 1\) is well calibrated, \(B > 1\) indicates underestimated risk, \(B < 1\) indicates overestimated risk.
Portfolio QLIKE: evaluates portfolio variance forecasts along one or more portfolio directions by comparing the forecast portfolio variance with the realized sum of squared portfolio returns over the evaluation window. Lower values indicate better portfolio variance forecasts.

When X_test contains NaNs (e.g. holidays, pre-listing, or post-delisting periods), only finite observations contribute to the aggregated return. For portfolio diagnostics, NaN returns for active assets contribute zero to the realized portfolio return and the forecast covariance is scaled by the pairwise observation count matrix \(H\) (Hadamard product \(H \odot \Sigma\)) so that the realized portfolio variance and forecast variance follow the same missing-data convention. In skfolio, NaN diagonal entries in the forecast covariance mark inactive assets, which are excluded from the evaluation.

When multiple test portfolios are provided, portfolio-level diagnostics are computed for each portfolio independently. The cross-portfolio distribution of bias statistics reveals anisotropic calibration errors that a single portfolio might miss.

Parameters:

observationsndarray of shape (n_steps,): Time index labels for each evaluation step.
horizonint: Number of observations per evaluation window. Every window has exactly this many observations.
squared_mahalanobis_distancendarray of shape (n_steps,): Squared Mahalanobis distance \(d_t^2 = {R_t^{(h)}}^\top(h\,\Sigma_t)^{-1}R_t^{(h)}\). Under correct Gaussian calibration each value follows a \(\chi^2(n)\) distribution, where \(n\) is the number of active assets.
mahalanobis_calibration_rationdarray of shape (n_steps,): \(d_t^2 / n\), where \(n\) is the number of active assets. Target is 1.0. Tests whether the full covariance structure (all eigenvalue directions) is correctly specified.
diagonal_calibration_rationdarray of shape (n_steps,): \(\frac{1}{n}\sum_i (R_{i,t}^{(h)})^2 / (h_{i,t}\,\sigma_{i,t}^2)\). Target is 1.0. Tests individual asset variances only.
portfolio_standardized_returnndarray of shape (n_steps, n_portfolios): \(b_t = r_{p,t} / \hat\sigma_{p,t}\). Target mean is 0.0 and target std is 1.0 (the bias statistic).
portfolio_variance_qlike_lossndarray of shape (n_steps, n_portfolios): \(\log(\hat\sigma_{p,t}^{2}) + \sum_{j=1}^{h} r_{p,t,j}^{2} / \hat\sigma_{p,t}^{2}\). Compares the forecast portfolio variance with the realized sum of squared portfolio returns over the evaluation window. Lower values are better.
n_valid_assetsndarray of shape (n_steps,): Number of active assets used at each evaluation step.
n_portfoliosint: Number of test portfolios.
namestr or None, default=None: Display name for the evaluation.

Attributes:

bias_statistic: Per-portfolio bias statistic.
name

Methods

`bias_statistic_summary`()	Cross-portfolio distribution of bias statistics.
`exceedance_summary`([confidence_levels])	Exceedance rate summary.
`plot_calibration`([diagnostics, window, title])	Rolling calibration diagnostics over time.
`plot_exceedance`([confidence_levels, window, ...])	Rolling exceedance rates over time.
`plot_qlike_loss`([window, title])	Rolling portfolio QLIKE loss over time.
`summary`()	Consolidated summary statistics.

Examples

>>> from skfolio.model_selection import online_covariance_forecast_evaluation
>>> from skfolio.moments import EWCovariance
>>>
>>> evaluation = online_covariance_forecast_evaluation(
...     EWCovariance(half_life=30),
...     X,
...     warmup_size=252,
... )
>>> evaluation.summary()
>>> evaluation.plot_calibration()

property bias_statistic#

Per-portfolio bias statistic.

Computed as the sample standard deviation of the portfolio standardized returns \(B_k = \mathrm{std}(b_{k,t})\) for each test portfolio \(k\).

A value near 1.0 indicates well-calibrated risk forecasts. Values above 1.0 indicate underestimated risk; values below 1.0 indicate overestimated risk.

Returns:

biasndarray of shape (n_portfolios,)

bias_statistic_summary()[source]#

Cross-portfolio distribution of bias statistics.

Computes percentiles of bias statistics across test portfolios. This is useful for evaluating covariance forecast quality using a set of representative portfolios.

Under Gaussian returns with perfect forecasts, \(B^2(T-1)\) follows a \(\chi^2(T-1)\) distribution where \(T\) is the number of evaluation steps. Reference bands can be derived from the appropriate chi-squared quantiles: \(B_{p} = \sqrt{\chi^2_{p}(T-1) / (T-1)}\). In financial return series, heavy tails widen these bands because the sampling variance of \(B\) increases.

Returns:

summarySeries

exceedance_summary(confidence_levels=(0.95, 0.99))[source]#

Exceedance rate summary.

Compares squared Mahalanobis distances to \(\chi^2\) thresholds. The rate is sensitive not only to covariance misspecification but also to heavy tails, regime shifts, and non-Gaussian standardized returns. It is best used as a comparative metric across estimators rather than as an absolute calibration test.

Parameters:

confidence_levelstuple of float, default=(0.95, 0.99): Confidence levels used to define the upper chi-squared thresholds.

Returns:

summaryDataFrame: Indexed by confidence_level with columns observed_rate and deviation, where deviation is measured relative to the target exceedance rate \(1 - \text{confidence\_level}\).

plot_calibration(diagnostics=('mahalanobis', 'diagonal', 'bias'), window=50, title=None)[source]#

Rolling calibration diagnostics over time.

Plots rolling calibration diagnostics with a reference line at 1.0. By default all three diagnostics are shown: rolling mean of the Mahalanobis ratio, rolling mean of the diagonal ratio, and rolling standard deviation of the portfolio standardized return (bias statistic).

For multiple portfolios, the bias statistic shows the median across portfolios with a P5-P95 shaded band.

Parameters:

diagnosticstuple of str, default=(“mahalanobis”, “diagonal”, “bias”): Which diagnostics to include. Valid values are "mahalanobis", "diagonal", and "bias".
windowint, default=50: Rolling window length.
titlestr, optional: Custom figure title.

Returns:

figgo.Figure

plot_exceedance(confidence_levels=(0.95, 0.99), window=50, title=None)[source]#

Rolling exceedance rates over time.

Parameters:

confidence_levelstuple of float, default=(0.95, 0.99): Confidence levels used to define the upper chi-squared thresholds.
windowint, default=50: Rolling window length.
titlestr, optional: Custom figure title.

Returns:

figgo.Figure

plot_qlike_loss(window=50, title=None)[source]#

Rolling portfolio QLIKE loss over time.

The QLIKE loss compares the forecast portfolio variance with the realized sum of squared portfolio returns over the evaluation window. Lower values are better.

For multiple portfolios, a shaded band shows the P5-P95 range across portfolios, with a line for the median.

Parameters:

windowint, default=50: Rolling window length.
titlestr, optional: Custom figure title.

Returns:

figgo.Figure

summary()[source]#

Consolidated summary statistics.

Returns a DataFrame with one row per metric and columns mean, median, std, p5, p95, mad_from_target, and target.

For calibration ratios, the target is 1.0, so mad_from_target is the mean absolute deviation from 1.0.
For portfolio standardized returns, the target mean is 0.0, so mad_from_target is the mean absolute value. The std column corresponds to the bias statistic \(B = \mathrm{std}(b_t)\), whose target is 1.0. Values near 1.0 indicate well-calibrated risk forecasts, values above 1.0 indicate underestimated risk, and values below 1.0 indicate overestimated risk.When only one portfolio is evaluated, the std column is exactly that portfolio’s bias statistic. When multiple portfolios are evaluated, portfolio-level diagnostics are first computed separately for each portfolio and then aggregated by their median. In particular, the std column becomes the median of the per-portfolio bias statistics. See also bias_statistic and bias_statistic_summary.
For QLIKE loss, there is no fixed numeric target. Accordingly, mad_from_target is NaN and target is "lower is better".

Returns:

summaryDataFrame