skfolio.model_selection.CovarianceForecastEvaluation#
- class skfolio.model_selection.CovarianceForecastEvaluation(observations, horizon, squared_mahalanobis_distance, mahalanobis_calibration_ratio, diagonal_calibration_ratio, portfolio_standardized_return, portfolio_variance_qlike_loss, n_valid_assets, n_portfolios, name=None)[source]#
Out-of-sample covariance forecast evaluation.
Stores per-step calibration diagnostics produced by
covariance_forecast_evaluationoronline_covariance_forecast_evaluationand provides summary statistics and plots.The four core diagnostics are:
Mahalanobis calibration ratio: tests whether the full covariance structure (all eigenvalue directions) is correctly specified. At each step, let \(r_t\) be the one-period realized return vector and let \(R^{(h)}\) be the aggregated return over the evaluation window of \(h\) observations. The squared Mahalanobis distance \(d^2 = {R^{(h)}}^\top(h\,\Sigma)^{-1}R^{(h)}\) yields the calibration ratio \(d^2 / n\), where \(n\) is the number of active assets. The target is 1.0. A value above 1.0 indicates underestimated risk; below 1.0 indicates overestimated risk.
Diagonal calibration ratio: tests whether the individual asset variances are correctly specified, ignoring correlations. Computed as \(\frac{1}{n}\sum_i (R_i^{(h)})^2 / (h_i\,\sigma_i^2)\) where \(h_i\) is the number of finite returns for asset \(i\) in the evaluation window. The target is 1.0. A value above 1.0 indicates underestimated volatilities; below 1.0 indicates overestimated volatilities.
Portfolio standardized returns: tests whether the covariance is well calibrated along one or more portfolio directions rather than across all directions. For a portfolio with weights \(w\), the realized portfolio return is standardized by the matching forecast portfolio volatility: \(b = r_p / \hat\sigma_p\) with \(r_p = w^\top R^{(h)}\) and \(\hat\sigma_p^{2} = w^\top(h\,\Sigma)w\). Under correct calibration \(b_t\) has mean 0 and standard deviation 1. The bias statistic \(B = \mathrm{std}(b_t)\) summarizes forecast quality: \(B \approx 1\) is well calibrated, \(B > 1\) indicates underestimated risk, \(B < 1\) indicates overestimated risk.
Portfolio QLIKE: evaluates portfolio variance forecasts along one or more portfolio directions by comparing the forecast portfolio variance with the realized sum of squared portfolio returns over the evaluation window. Lower values indicate better portfolio variance forecasts.
When
X_testcontains NaNs (e.g. holidays, pre-listing, or post-delisting periods), only finite observations contribute to the aggregated return. For portfolio diagnostics, NaN returns for active assets contribute zero to the realized portfolio return and the forecast covariance is scaled by the pairwise observation count matrix \(H\) (Hadamard product \(H \odot \Sigma\)) so that the realized portfolio variance and forecast variance follow the same missing-data convention. In skfolio, NaN diagonal entries in the forecast covariance mark inactive assets, which are excluded from the evaluation.When multiple test portfolios are provided, portfolio-level diagnostics are computed for each portfolio independently. The cross-portfolio distribution of bias statistics reveals anisotropic calibration errors that a single portfolio might miss.
- Parameters:
- observationsndarray of shape (n_steps,)
Time index labels for each evaluation step.
- horizonint
Number of observations per evaluation window. Every window has exactly this many observations.
- squared_mahalanobis_distancendarray of shape (n_steps,)
Squared Mahalanobis distance \(d_t^2 = {R_t^{(h)}}^\top(h\,\Sigma_t)^{-1}R_t^{(h)}\). Under correct Gaussian calibration each value follows a \(\chi^2(n)\) distribution, where \(n\) is the number of active assets.
- mahalanobis_calibration_rationdarray of shape (n_steps,)
\(d_t^2 / n\), where \(n\) is the number of active assets. Target is 1.0. Tests whether the full covariance structure (all eigenvalue directions) is correctly specified.
- diagonal_calibration_rationdarray of shape (n_steps,)
\(\frac{1}{n}\sum_i (R_{i,t}^{(h)})^2 / (h_{i,t}\,\sigma_{i,t}^2)\). Target is 1.0. Tests individual asset variances only.
- portfolio_standardized_returnndarray of shape (n_steps, n_portfolios)
\(b_t = r_{p,t} / \hat\sigma_{p,t}\). Target mean is 0.0 and target std is 1.0 (the bias statistic).
- portfolio_variance_qlike_lossndarray of shape (n_steps, n_portfolios)
\(\log(\hat\sigma_{p,t}^{2}) + \sum_{j=1}^{h} r_{p,t,j}^{2} / \hat\sigma_{p,t}^{2}\). Compares the forecast portfolio variance with the realized sum of squared portfolio returns over the evaluation window. Lower values are better.
- n_valid_assetsndarray of shape (n_steps,)
Number of active assets used at each evaluation step.
- n_portfoliosint
Number of test portfolios.
- namestr or None, default=None
Display name for the evaluation.
- Attributes:
bias_statisticPer-portfolio bias statistic.
- name
Methods
Cross-portfolio distribution of bias statistics.
exceedance_summary([confidence_levels])Exceedance rate summary.
plot_calibration([diagnostics, window, title])Rolling calibration diagnostics over time.
plot_exceedance([confidence_levels, window, ...])Rolling exceedance rates over time.
plot_qlike_loss([window, title])Rolling portfolio QLIKE loss over time.
summary()Consolidated summary statistics.
Examples
>>> from skfolio.model_selection import online_covariance_forecast_evaluation >>> from skfolio.moments import EWCovariance >>> >>> evaluation = online_covariance_forecast_evaluation( ... EWCovariance(half_life=30), ... X, ... warmup_size=252, ... ) >>> evaluation.summary() >>> evaluation.plot_calibration()
- property bias_statistic#
Per-portfolio bias statistic.
Computed as the sample standard deviation of the portfolio standardized returns \(B_k = \mathrm{std}(b_{k,t})\) for each test portfolio \(k\).
A value near 1.0 indicates well-calibrated risk forecasts. Values above 1.0 indicate underestimated risk; values below 1.0 indicate overestimated risk.
- Returns:
- biasndarray of shape (n_portfolios,)
- bias_statistic_summary()[source]#
Cross-portfolio distribution of bias statistics.
Computes percentiles of bias statistics across test portfolios. This is useful for evaluating covariance forecast quality using a set of representative portfolios.
Under Gaussian returns with perfect forecasts, \(B^2(T-1)\) follows a \(\chi^2(T-1)\) distribution where \(T\) is the number of evaluation steps. Reference bands can be derived from the appropriate chi-squared quantiles: \(B_{p} = \sqrt{\chi^2_{p}(T-1) / (T-1)}\). In financial return series, heavy tails widen these bands because the sampling variance of \(B\) increases.
- Returns:
- summarySeries
- exceedance_summary(confidence_levels=(0.95, 0.99))[source]#
Exceedance rate summary.
Compares squared Mahalanobis distances to \(\chi^2\) thresholds. The rate is sensitive not only to covariance misspecification but also to heavy tails, regime shifts, and non-Gaussian standardized returns. It is best used as a comparative metric across estimators rather than as an absolute calibration test.
- Parameters:
- confidence_levelstuple of float, default=(0.95, 0.99)
Confidence levels used to define the upper chi-squared thresholds.
- Returns:
- summaryDataFrame
Indexed by
confidence_levelwith columnsobserved_rateanddeviation, wheredeviationis measured relative to the target exceedance rate \(1 - \text{confidence\_level}\).
- plot_calibration(diagnostics=('mahalanobis', 'diagonal', 'bias'), window=50, title=None)[source]#
Rolling calibration diagnostics over time.
Plots rolling calibration diagnostics with a reference line at 1.0. By default all three diagnostics are shown: rolling mean of the Mahalanobis ratio, rolling mean of the diagonal ratio, and rolling standard deviation of the portfolio standardized return (bias statistic).
For multiple portfolios, the bias statistic shows the median across portfolios with a P5-P95 shaded band.
- Parameters:
- diagnosticstuple of str, default=(“mahalanobis”, “diagonal”, “bias”)
Which diagnostics to include. Valid values are
"mahalanobis","diagonal", and"bias".- windowint, default=50
Rolling window length.
- titlestr, optional
Custom figure title.
- Returns:
- figgo.Figure
- plot_exceedance(confidence_levels=(0.95, 0.99), window=50, title=None)[source]#
Rolling exceedance rates over time.
Compares squared Mahalanobis distances to \(\chi^2\) thresholds. The rate is sensitive not only to covariance misspecification but also to heavy tails, regime shifts, and non-Gaussian standardized returns. It is best used as a comparative metric across estimators rather than as an absolute calibration test.
- Parameters:
- confidence_levelstuple of float, default=(0.95, 0.99)
Confidence levels used to define the upper chi-squared thresholds.
- windowint, default=50
Rolling window length.
- titlestr, optional
Custom figure title.
- Returns:
- figgo.Figure
- plot_qlike_loss(window=50, title=None)[source]#
Rolling portfolio QLIKE loss over time.
The QLIKE loss compares the forecast portfolio variance with the realized sum of squared portfolio returns over the evaluation window. Lower values are better.
For multiple portfolios, a shaded band shows the P5-P95 range across portfolios, with a line for the median.
- Parameters:
- windowint, default=50
Rolling window length.
- titlestr, optional
Custom figure title.
- Returns:
- figgo.Figure
- summary()[source]#
Consolidated summary statistics.
Returns a DataFrame with one row per metric and columns
mean,median,std,p5,p95,mad_from_target, andtarget.For calibration ratios, the target is
1.0, somad_from_targetis the mean absolute deviation from1.0.For portfolio standardized returns, the target mean is
0.0, somad_from_targetis the mean absolute value. Thestdcolumn corresponds to the bias statistic \(B = \mathrm{std}(b_t)\), whose target is1.0. Values near1.0indicate well-calibrated risk forecasts, values above1.0indicate underestimated risk, and values below1.0indicate overestimated risk.When only one portfolio is evaluated, thestdcolumn is exactly that portfolio’s bias statistic. When multiple portfolios are evaluated, portfolio-level diagnostics are first computed separately for each portfolio and then aggregated by their median. In particular, thestdcolumn becomes the median of the per-portfolio bias statistics. See alsobias_statisticandbias_statistic_summary.For QLIKE loss, there is no fixed numeric target. Accordingly,
mad_from_targetis NaN andtargetis"lower is better".
- Returns:
- summaryDataFrame