Note

Go to the end to download the full example code. or to run this example in your browser via JupyterLite or Binder

Multiple Randomized Cross-Validation#

This tutorial introduces MultipleRandomizedCV, which is based on the “Multiple Randomized Backtests” methodology of Palomar in [1]. This cross-validation strategy performs a resampling-based evaluation by repeatedly sampling distinct asset subsets (without replacement) and contiguous time windows, then applying an inner walk-forward split to each subsample, capturing both temporal and cross-sectional variability in performance.

In this example, we build a portfolio model composed of a preselection of top performers, followed by a Hierarchical Equal Risk Contribution optimization with covariance shrinkage. We split the dataset into training and test sets, tune hyperparameters on the training set, and then evaluate the final portfolio models on the test set using MultipleRandomizedCV.

Data Loading#

We load the FTSE 100 dataset, which contains daily prices of 64 assets from the FTSE 100 index, spanning 2000-01-04 to 2023-05-31.

import scipy.stats as stats
from plotly.io import show
from sklearn import set_config
from sklearn.model_selection import RandomizedSearchCV, train_test_split
from sklearn.pipeline import Pipeline

from skfolio import Population, RatioMeasure, RiskMeasure
from skfolio.datasets import load_ftse100_dataset
from skfolio.metrics import make_scorer
from skfolio.model_selection import MultipleRandomizedCV, WalkForward, cross_val_predict
from skfolio.moments import ShrunkCovariance
from skfolio.optimization import HierarchicalEqualRiskContribution
from skfolio.pre_selection import SelectKExtremes
from skfolio.preprocessing import prices_to_returns
from skfolio.prior import EmpiricalPrior

set_config(transform_output="pandas")

prices = load_ftse100_dataset()
returns = prices_to_returns(prices)

# Sequential train-test split: 67% training, 33% testing.
# `shuffle=False` preserves chronological order, crucial for time-series data.
X_train, X_test = train_test_split(returns, test_size=0.33, shuffle=False)

Portfolio Construction#

We build a pipeline that first selects the top-k assets by Sharpe ratio, then allocates weights via Hierarchical Equal Risk Contribution using a shrunk covariance estimator.

pre_selection = SelectKExtremes(k=10, measure=RatioMeasure.SHARPE_RATIO, highest=True)

optimization = HierarchicalEqualRiskContribution(
    prior_estimator=EmpiricalPrior(
        covariance_estimator=ShrunkCovariance(shrinkage=0.5)
    ),
    risk_measure=RiskMeasure.VARIANCE,
)

model_bench = Pipeline(
    [
        ("pre_selection", pre_selection),
        ("optimization", optimization),
    ]
)

Rebalancing Strategy#

We use WalkForward to define a monthly rebalancing (20 trading days), training on the prior year (252 trading days):

walk_forward = WalkForward(test_size=20, train_size=252)

Note that WalkForward also supports specific datetime frequencies. For examples, we could use walk_forward = WalkForward(test_size=1, train_size=12, freq="WOM-3FRI") to rebalance monthly on the third Friday (WOM-3FRI), training on the prior 12 months.

Hyperparameter Tuning#

Initially, the number of selected assets and the shrinkage parameter were chosen randomly. We use RandomizedSearchCV to explore these parameters and find the combination that maximizes the mean out-of-sample CVaR ratio.

random_search = RandomizedSearchCV(
    estimator=model_bench,
    cv=walk_forward,
    n_jobs=-1,
    param_distributions={
        "pre_selection__k": stats.randint(10, 30),
        "optimization__prior_estimator__covariance_estimator__shrinkage": stats.uniform(
            0, 1
        ),
    },
    n_iter=30,
    random_state=0,
    scoring=make_scorer(RatioMeasure.CVAR_RATIO),
)
random_search.fit(X_train)

# Retrieve the best estimator from the search.
model_tuned = random_search.best_estimator_
model_tuned

Pipeline(steps=[('pre_selection', SelectKExtremes(k=27)),
                ('optimization',
                 HierarchicalEqualRiskContribution(prior_estimator=EmpiricalPrior(covariance_estimator=ShrunkCovariance(shrinkage=np.float64(0.22232138825158765)))))])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Pipeline

?Documentation for PipelineiFitted

Parameters

	steps	[('pre_selection', ...), ('optimization', ...)]
	transform_input	None
	memory	None
	verbose	False

SelectKExtremes

Parameters

	k	27
	measure	Sharpe Ratio
	highest	True

optimization: HierarchicalEqualRiskContribution

Parameters

	risk_measure	Variance
	prior_estimator	EmpiricalPrio...38825158765)))
	distance_estimator	None
	hierarchical_clustering_estimator	None
	min_weights	0.0
	max_weights	1.0
	solver	'CLARABEL'
	solver_params	None
	transaction_costs	0.0
	management_fees	0.0
	previous_weights	None
	portfolio_params	None
	fallback	None
	raise_on_failure	True

prior_estimator: EmpiricalPrior

EmpiricalPrior(covariance_estimator=ShrunkCovariance(shrinkage=np.float64(0.22232138825158765)))

covariance_estimator: ShrunkCovariance

ShrunkCovariance(shrinkage=np.float64(0.22232138825158765))

ShrunkCovariance

Parameters

	store_precision	True
	assume_centered	False
	shrinkage	np.float64(0....2138825158765)
	nearest	True
	higham	False
	higham_max_iteration	100

In practice, it’s recommended to increase n_iter to sample more parameter combinations, then plot those samples to ensure adequate search-space coverage and examine the convergence of training and test performance. (see the L1 and L2 Regularization tutorial).

Standard Walk-Forward Analysis#

We evaluate both the benchmark and tuned models on the test set using standard walk-forward analysis, which yields a single backtest path per model. A single backtest path represents one possible trajectory of cumulative returns under the given rebalancing scheme and parameter set. While easy to compute, it may understate the variability and uncertainty of real-world performance compared to resampling-based methods.

pred_bench = cross_val_predict(model_bench, X_test, cv=walk_forward)
pred_bench.name = "Benchmark Model"

pred_tuned = cross_val_predict(model_tuned, X_test, cv=walk_forward, n_jobs=-1)
pred_tuned.name = "Tuned Model"

# Combine results for easier analysis.
population = Population([pred_bench, pred_tuned])
population.plot_cumulative_returns()

Display a summary of key performance metrics.

population.summary()

	Benchmark Model	Tuned Model
Mean	0.022%	0.041%
Annualized Mean	5.52%	10.39%
Variance	0.013%	0.010%
Annualized Variance	3.33%	2.60%
Semi-Variance	0.0072%	0.0056%
Annualized Semi-Variance	1.81%	1.42%
Standard Deviation	1.15%	1.02%
Annualized Standard Deviation	18.26%	16.13%
Semi-Deviation	0.85%	0.75%
Annualized Semi-Deviation	13.44%	11.92%
Mean Absolute Deviation	0.82%	0.71%
CVaR at 95%	2.79%	2.45%
EVaR at 95%	4.96%	4.87%
Worst Realization	9.30%	9.29%
CDaR at 95%	23.22%	17.11%
MAX Drawdown	38.97%	34.39%
Average Drawdown	9.70%	4.47%
EDaR at 95%	27.88%	23.16%
First Lower Partial Moment	0.41%	0.35%
Ulcer Index	0.12	0.064
Gini Mean Difference	1.21%	1.05%
Value at Risk at 95%	1.86%	1.56%
Drawdown at Risk at 95%	20.31%	12.72%
Entropic Risk Measure at 95%	3.00	3.00
Fourth Central Moment	0.000017%	0.000013%
Fourth Lower Partial Moment	0.000010%	0.000008%
Skew	-36.24%	-52.21%
Kurtosis	965.95%	1231.87%
Sharpe Ratio	0.019	0.041
Annualized Sharpe Ratio	0.30	0.64
Sortino Ratio	0.026	0.055
Annualized Sortino Ratio	0.41	0.87
Mean Absolute Deviation Ratio	0.027	0.058
First Lower Partial Moment Ratio	0.053	0.12
Value at Risk Ratio at 95%	0.012	0.026
CVaR Ratio at 95%	0.0078	0.017
Entropic Risk Measure Ratio at 95%	0.000073	0.00014
EVaR Ratio at 95%	0.0044	0.0085
Worst Realization Ratio	0.0024	0.0044
Drawdown at Risk Ratio at 95%	0.0011	0.0032
CDaR Ratio at 95%	0.00094	0.0024
Calmar Ratio	0.00056	0.0012
Average Drawdown Ratio	0.0023	0.0092
EDaR Ratio at 95%	0.00079	0.0018
Ulcer Index Ratio	0.0019	0.0065
Gini Mean Difference Ratio	0.018	0.039
Avg nb of Assets per Portfolio	10.0	27.0
Number of Portfolios	85	85
Number of Failed Portfolios	0	0
Number of Fallback Portfolios	0	0

Multiple Randomized Cross-Validation#

We perform resampling-based cross-validation by drawing 500 subsamples of 50 distinct assets and contiguous 3-year windows (3 x 252 trading days), then applying our walk-forward split to each subsample. This approach captures both temporal and cross-sectional variability.

cv_mc = MultipleRandomizedCV(
    walk_forward=walk_forward,
    n_subsamples=500,
    asset_subset_size=50,
    window_size=3 * 252,
    random_state=0,
)

# Generate cross-validated predictions for both models.
pred_bench_mc = cross_val_predict(
    model_bench,
    X_test,
    cv=cv_mc,
    n_jobs=-1,
    portfolio_params={"tag": "Benchmark Model"},
)

pred_tuned_mc = cross_val_predict(
    model_tuned, X_test, cv=cv_mc, n_jobs=-1, portfolio_params={"tag": "Tuned Model"}
)

# Combine results for easier analysis.
population_mc = pred_bench_mc + pred_tuned_mc

Visualization and Analysis#

We plot cumulative returns for the first 10 MultiPeriodPortfolio (resampled paths) of the tuned model. Each MultiPeriodPortfolio concatenates the test (out-of-sample) results from the walk-forward.

fig = pred_tuned_mc[:10].plot_cumulative_returns(use_tag_in_legend=False)
show(fig)

We now compute and display the distribution of out-of-sample annualized Sharpe ratios:

population_mc.plot_distribution(
    measure_list=[RatioMeasure.ANNUALIZED_SHARPE_RATIO],
    tag_list=["Benchmark Model", "Tuned Model"],
)

for pred in [pred_bench_mc, pred_tuned_mc]:
    tag = pred[0].tag
    mean_sr = pred.measures_mean(measure=RatioMeasure.ANNUALIZED_SHARPE_RATIO)
    std_sr = pred.measures_std(measure=RatioMeasure.ANNUALIZED_SHARPE_RATIO)
    print(f"{tag}\n{'=' * len(tag)}")
    print(f"Average Sharpe Ratio: {mean_sr:0.2f}")
    print(f"Sharpe Ratio Std Dev: {std_sr:0.2f}\n")

Benchmark Model
===============
Average Sharpe Ratio: 0.34
Sharpe Ratio Std Dev: 0.36

Tuned Model
===========
Average Sharpe Ratio: 0.53
Sharpe Ratio Std Dev: 0.34

Let’s display the Box plot of the CVaR Ratio:

population_mc.boxplot_measure(
    measure=RatioMeasure.CVAR_RATIO, tag_list=["Benchmark Model", "Tuned Model"]
)

We plot the asset composition for the first two MultiPeriodPortfolio:

pred_tuned_mc[:2].plot_composition(display_sub_ptf_name=False)

We plot the weights evolution over time for the first MultiPeriodPortfolio:

pred_tuned_mc[0].plot_weights_per_observation()

Conclusion#

A single-path walk-forward analysis may understate the variability and uncertainty of real-world performance. Multiple Randomized Cross-Validation, by contrast, applies a resampling-based evaluation across asset subsets and time windows, yielding performance estimates that are more robust and less prone to overfitting.

References#

Total running time of the script: (6 minutes 9.861 seconds)

Gallery generated by Sphinx-Gallery