Multiple Randomized Cross-Validation#

This tutorial introduces MultipleRandomizedCV, which is based on the “Multiple Randomized Backtests” methodology of Palomar in [1]. This cross-validation strategy performs a Monte Carlo–style evaluation by repeatedly sampling distinct asset subsets (without replacement) and contiguous time windows, then applying an inner walk-forward split to each subsample, capturing both temporal and cross-sectional variability in performance.

In this example, we build a portfolio model composed of a preselection of top performers, followed by a Hierarchical Equal Risk Contribution optimization with covariance shrinkage. We split the dataset into training and testing sets, tune hyperparameters on the training set, and then evaluate the final portfolio models on the test set using MultipleRandomizedCV.

Data Loading#

We load the FTSE 100 dataset, which contains daily prices of 64 assets from the FTSE 100 index, spanning 2000-01-04 to 2023-05-31.

import scipy.stats as stats
from plotly.io import show
from sklearn import set_config
from sklearn.model_selection import RandomizedSearchCV, train_test_split
from sklearn.pipeline import Pipeline

from skfolio import Population, RatioMeasure, RiskMeasure
from skfolio.datasets import load_ftse100_dataset
from skfolio.metrics import make_scorer
from skfolio.model_selection import MultipleRandomizedCV, WalkForward, cross_val_predict
from skfolio.moments import ShrunkCovariance
from skfolio.optimization import HierarchicalEqualRiskContribution
from skfolio.pre_selection import SelectKExtremes
from skfolio.preprocessing import prices_to_returns
from skfolio.prior import EmpiricalPrior

set_config(transform_output="pandas")

prices = load_ftse100_dataset()
returns = prices_to_returns(prices)

# Sequential train-test split: 67% training, 33% testing.
# `shuffle=False` preserves chronological order, crucial for time-series data.
X_train, X_test = train_test_split(returns, test_size=0.33, shuffle=False)

Portfolio Construction#

We build a pipeline that first selects the top-k assets by Sharpe ratio, then allocates weights via Hierarchical Equal Risk Contribution using a shrunk covariance estimator.

pre_selection = SelectKExtremes(k=10, measure=RatioMeasure.SHARPE_RATIO, highest=True)

optimization = HierarchicalEqualRiskContribution(
    prior_estimator=EmpiricalPrior(
        covariance_estimator=ShrunkCovariance(shrinkage=0.5)
    ),
    risk_measure=RiskMeasure.VARIANCE,
)

model_bench = Pipeline(
    [
        ("pre_selection", pre_selection),
        ("optimization", optimization),
    ]
)

Rebalancing Strategy#

We use WalkForward to define a monthly rebalancing (20 trading days), training on the prior year (252 trading days):

walk_forward = WalkForward(test_size=20, train_size=252)

Note that WalkForward also supports specific datetime frequencies. For examples, we could use walk_forward = WalkForward(test_size=1, train_size=12, freq="WOM-3FRI") to rebalance monthly on the third Friday (WOM-3FRI), training on the prior 12 months.

Hyperparameter Tuning#

Initially, the number of selected assets and the shrinkage parameter were chosen randomly. We use RandomizedSearchCV to explore these parameters and find the combination that maximizes the mean out-of-sample CVaR ratio.

random_search = RandomizedSearchCV(
    estimator=model_bench,
    cv=walk_forward,
    n_jobs=-1,
    param_distributions={
        "pre_selection__k": stats.randint(10, 30),
        "optimization__prior_estimator__covariance_estimator__shrinkage": stats.uniform(
            0, 1
        ),
    },
    n_iter=30,
    random_state=0,
    scoring=make_scorer(RatioMeasure.CVAR_RATIO),
)
random_search.fit(X_train)

# Retrieve the best estimator from the search.
model_tuned = random_search.best_estimator_
model_tuned
Pipeline(steps=[('pre_selection', SelectKExtremes(k=27)),
                ('optimization',
                 HierarchicalEqualRiskContribution(prior_estimator=EmpiricalPrior(covariance_estimator=ShrunkCovariance(shrinkage=np.float64(0.22232138825158765)))))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


In practice, it’s recommended to increase n_iter to sample more parameter combinations, then plot those samples to ensure adequate search-space coverage and examine the convergence of training and test performance. (see the L1 and L2 Regularization tutorial).

Standard Walk-Forward Analysis#

We evaluate both the benchmark and tuned models on the test set using standard walk-forward analysis, which yields a single backtest path per model. A single backtest path represents one possible trajectory of cumulative returns under the given rebalancing scheme and parameter set. While easy to compute, it may understate the variability and uncertainty of real-world performance compared to Monte Carlo-based methods.

pred_bench = cross_val_predict(model_bench, X_test, cv=walk_forward)
pred_bench.name = "Benchmark Model"

pred_tuned = cross_val_predict(model_tuned, X_test, cv=walk_forward, n_jobs=-1)
pred_tuned.name = "Tuned Model"

# Combine results for easier analysis.
population = Population([pred_bench, pred_tuned])
population.plot_cumulative_returns()


Display a summary of key performance metrics.

population.summary()
Benchmark Model Tuned Model
Mean 0.022% 0.041%
Annualized Mean 5.52% 10.39%
Variance 0.013% 0.010%
Annualized Variance 3.33% 2.60%
Semi-Variance 0.0072% 0.0056%
Annualized Semi-Variance 1.81% 1.42%
Standard Deviation 1.15% 1.02%
Annualized Standard Deviation 18.26% 16.13%
Semi-Deviation 0.85% 0.75%
Annualized Semi-Deviation 13.44% 11.92%
Mean Absolute Deviation 0.82% 0.71%
CVaR at 95% 2.79% 2.45%
EVaR at 95% 4.96% 4.87%
Worst Realization 9.30% 9.29%
CDaR at 95% 23.22% 17.11%
MAX Drawdown 38.97% 34.39%
Average Drawdown 9.70% 4.47%
EDaR at 95% 27.88% 23.16%
First Lower Partial Moment 0.41% 0.35%
Ulcer Index 0.12 0.064
Gini Mean Difference 1.21% 1.05%
Value at Risk at 95% 1.86% 1.56%
Drawdown at Risk at 95% 20.31% 12.72%
Entropic Risk Measure at 95% 3.00 3.00
Fourth Central Moment 0.000017% 0.000013%
Fourth Lower Partial Moment 0.000010% 0.000008%
Skew -36.24% -52.21%
Kurtosis 965.95% 1231.87%
Sharpe Ratio 0.019 0.041
Annualized Sharpe Ratio 0.30 0.64
Sortino Ratio 0.026 0.055
Annualized Sortino Ratio 0.41 0.87
Mean Absolute Deviation Ratio 0.027 0.058
First Lower Partial Moment Ratio 0.053 0.12
Value at Risk Ratio at 95% 0.012 0.026
CVaR Ratio at 95% 0.0078 0.017
Entropic Risk Measure Ratio at 95% 0.000073 0.00014
EVaR Ratio at 95% 0.0044 0.0085
Worst Realization Ratio 0.0024 0.0044
Drawdown at Risk Ratio at 95% 0.0011 0.0032
CDaR Ratio at 95% 0.00094 0.0024
Calmar Ratio 0.00056 0.0012
Average Drawdown Ratio 0.0023 0.0092
EDaR Ratio at 95% 0.00079 0.0018
Ulcer Index Ratio 0.0019 0.0065
Gini Mean Difference Ratio 0.018 0.039
Portfolios Number 85 85
Avg nb of Assets per Portfolio 10.0 27.0


Multiple Randomized Cross-Validation#

We perform Monte Carlo-style resampling by drawing 500 subsamples of 50 distinct assets and contiguous 3-year windows (3 x 252 trading days), then applying our walk-forward split to each subsample. This approach captures both temporal and cross-sectional variability.

cv_mc = MultipleRandomizedCV(
    walk_forward=walk_forward,
    n_subsamples=500,
    asset_subset_size=50,
    window_size=3 * 252,
    random_state=0,
)

# Generate cross-validated predictions for both models.
pred_bench_mc = cross_val_predict(
    model_bench,
    X_test,
    cv=cv_mc,
    n_jobs=-1,
    portfolio_params={"tag": "Benchmark Model"},
)

pred_tuned_mc = cross_val_predict(
    model_tuned, X_test, cv=cv_mc, n_jobs=-1, portfolio_params={"tag": "Tuned Model"}
)

# Combine results for easier analysis.
population_mc = pred_bench_mc + pred_tuned_mc

Visualization and Analysis#

We plot cumulative returns for the first 10 MultiPeriodPortfolio (Monte Carlo paths) of the tuned model. Each MultiPeriodPortfolio concatenates the test (out-of-sample) results from the walk-forward.

fig = pred_tuned_mc[:10].plot_cumulative_returns(use_tag_in_legend=False)
show(fig)

We now compute and display the distribution of out-of-sample annualized Sharpe ratios:

population_mc.plot_distribution(
    measure_list=[RatioMeasure.ANNUALIZED_SHARPE_RATIO],
    tag_list=["Benchmark Model", "Tuned Model"],
)


for pred in [pred_bench_mc, pred_tuned_mc]:
    tag = pred[0].tag
    mean_sr = pred.measures_mean(measure=RatioMeasure.ANNUALIZED_SHARPE_RATIO)
    std_sr = pred.measures_std(measure=RatioMeasure.ANNUALIZED_SHARPE_RATIO)
    print(f"{tag}\n{'=' * len(tag)}")
    print(f"Average Sharpe Ratio: {mean_sr:0.2f}")
    print(f"Sharpe Ratio Std Dev: {std_sr:0.2f}\n")
Benchmark Model
===============
Average Sharpe Ratio: 0.34
Sharpe Ratio Std Dev: 0.36

Tuned Model
===========
Average Sharpe Ratio: 0.53
Sharpe Ratio Std Dev: 0.34

We plot the asset composition for the first two MultiPeriodPortfolio:

pred_tuned_mc[:2].plot_composition(display_sub_ptf_name=False)


We plot the weights evolution over time for the first MultiPeriodPortfolio:

pred_tuned_mc[0].plot_weights_per_observation()


Conclusion#

A single-path walk-forward analysis may understate the variability and uncertainty of real-world performance. Multiple Randomized Cross-Validation, by contrast, applies Monte Carlo sampling across asset subsets and time windows, yielding performance estimates that are more robust and less prone to overfitting.

References#

Total running time of the script: (5 minutes 8.366 seconds)

Gallery generated by Sphinx-Gallery