skfolio.model_selection
.MultipleRandomizedCV#
- class skfolio.model_selection.MultipleRandomizedCV(walk_forward, n_subsamples, asset_subset_size, window_size=None, random_state=None)[source]#
Multiple Randomized Cross-Validation.
Based on the “Multiple Randomized Backtests” methodology of Palomar [1], this cross-validation strategy performs a Monte Carlo-style evaluation by repeatedly sampling distinct asset subsets (without replacement) and contiguous time windows, then applying an inner walk-forward split to each subsample, capturing both temporal and cross-sectional variability in performance.
On each of the
n_subsamples
iterations, the following actions are performed:Randomly pick a contiguous time window of length
window_size
(or the full history if None).Randomly pick an asset subset of size
asset_subset_size
(without replacement).Run a walk-forward split (via the supplied
walk_forward
object) on that sub-dataset.Yield
(train_indices, test_indices, asset_indices)
for each inner split.
Each asset subset is sampled without replacement (assets within each subset are distinct) and no subset is repeated across the
n_subsamples
draws. We employ the combinatorial unranking algorithm to compute any k-combination inO(n_subsamples * asset_subset_size)
time and space, without generating or storing all \(M=\binom{n\_assets}{asset\_subset\_size}\) subsets. When \(M\) is small, this guarantees exhaustive coverage of every possible asset-universe. Because ranks are drawn without replacement from a finite population of size \(M\), the variance of the Monte Carlo sample mean is reduced by the finite-population correction factor \(\tfrac{M - n\_subsamples}{M - 1}\).- Parameters:
- walk_forwardWalkForward
A
WalkForward
CV object to be applied to each subsample.- n_subsamplesint
Number of independent subsamples (sub-datasets) to draw. Each subsample is a (time window x asset subset) on which you run the inner walk-forward.
- asset_subset_sizeint
How many assets to include in each subsample. Must be less or equal to the total number of assets.
- window_sizeint or None, default=None
Length of the contiguous time slice (number of observations) for each subsample. If None, uses the full time series observations in every draw.
- random_stateint, RandomState instance or None, default=None
Seed or random state to ensure reproducibility.
References
[1]“Portfolio Optimization, Theory and Application”, Chapter 8, Daniel P. Palomar (2025)
Examples
- Tutorials using
MultipleRandomizedCV
:
>>> import numpy as np >>> from skfolio.datasets import load_sp500_dataset, load_factors_dataset >>> from skfolio.model_selection import WalkForward, MultipleRandomizedCV >>> from skfolio.preprocessing import prices_to_returns >>> >>> X = np.random.randn(4, 5) # 4 observations and 5 assets. >>> # Draw 2 subsamples (sub-datasets) with 3 assets chosen randomly among the 5. >>> # For each subsample, run a Walk Forward. >>> # Use the full time series (no time resampling). >>> cv = MultipleRandomizedCV( ... walk_forward=WalkForward(test_size=1, train_size=2), ... n_subsamples=2, ... asset_subset_size=3, ... window_size=None, ... random_state=0, ... ) >>> for i, (train_index, test_index, assets) in enumerate(cv.split(X)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") ... print(f" Assets: columns={assets}") Fold 0: Train: index=[0 1] Test: index=[2] Assets: columns=[0 1 4] Fold 1: Train: index=[1 2] Test: index=[3] Assets: columns=[0 1 4] Fold 2: Train: index=[0 1] Test: index=[2] Assets: columns=[1 3 4] Fold 3: Train: index=[1 2] Test: index=[3] Assets: columns=[1 3 4] >>> print(f"Path ids: {cv.get_path_ids()}") Path ids: [0 0 1 1] >>> >>> # Random contiguous time slice of 4 observations among 10 observations. >>> X = np.random.randn(10, 5) # 10 observations and 5 assets. >>> cv = MultipleRandomizedCV( ... walk_forward=WalkForward(test_size=1, train_size=2), ... n_subsamples=2, ... asset_subset_size=3, ... window_size=4, ... random_state=0, ... ) >>> for i, (train_index, test_index, assets) in enumerate(cv.split(X)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") ... print(f" Assets: columns={assets}") Fold 0: Train: index=[4 5] Test: index=[6] Assets: columns=[0 1 4] Fold 1: Train: index=[5 6] Test: index=[7] Assets: columns=[0 1 4] Fold 2: Train: index=[5 6] Test: index=[7] Assets: columns=[1 3 4] Fold 3: Train: index=[6 7] Test: index=[8] Assets: columns=[1 3 4] >>> >>> # Walk Forward with time-based (calendar) rebalancing. >>> # Rebalance every 3 months on the third Friday, and train on the last 12 months. >>> prices = load_sp500_dataset() >>> X = prices_to_returns(prices) >>> X = X["2021":"2022"] >>> cv = MultipleRandomizedCV( ... walk_forward=WalkForward(test_size=3, train_size=12, freq="WOM-3FRI"), ... n_subsamples=2, ... asset_subset_size=3, ... window_size=None, ... random_state=0, ... ) >>> for i, (train_index, test_index, assets) in enumerate(cv.split(X)): ... print(f"Fold {i}:") ... print(f" Train: size={len(train_index)}") ... print(f" Test: size={len(test_index)}") ... print(f" Assets: columns={assets}") Fold 0: Train: size=256 Test: size=59 Assets: columns=[ 9 16 17] Fold 1: Train: size=253 Test: size=61 Assets: columns=[ 9 16 17] Fold 2: Train: size=251 Test: size=69 Assets: columns=[ 9 16 17] Fold 3: Train: size=256 Test: size=59 Assets: columns=[ 7 10 14] Fold 4: Train: size=253 Test: size=61 Assets: columns=[ 7 10 14] Fold 5: Train: size=251 Test: size=69 Assets: columns=[ 7 10 14] >>> print(f"Path ids: {cv.get_path_ids()}") [0 0 0 1 1 1]
Methods
Return the path id of each test sets in each split.
split
(X[, y])Generate indices to split data into training and test set.
- split(X, y=None)[source]#
Generate indices to split data into training and test set.
- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Price returns of the assets.
- yarray-like of shape (n_observations, n_targets)
Always ignored, exists for compatibility.
- Yields:
- trainndarray
The training set indices for that split.
- testndarray
The testing set indices for that split.
- assetsndarray
The assets indices for that split.