skfolio.model_selection
.WalkForward#
- class skfolio.model_selection.WalkForward(test_size, train_size, freq=None, freq_offset=None, previous=False, expend_train=False, reduce_test=False, purged_size=0)[source]#
Walk Forward Cross-Validator.
Provides train/test indices to split time series data samples using a walk-forward logic.
In each split, test indices must be higher than the previous ones; therefore, shuffling in cross-validator is inappropriate.
Compared to
sklearn.model_selection.TimeSeriesSplit
, you control the train/test folds by specifying the number of training and test samples instead of the number of splits, making it more suitable for portfolio cross-validation.If your data is a DataFrame indexed with a DatetimeIndex, you can split the data using specific datetime frequencies and offsets.
- Parameters:
- test_sizeint
Length of each test set. If
freq
isNone
(default), it represents the number of observations. Otherwise, it represents the number of periods defined byfreq
.- train_sizeint | pandas.offsets.DateOffset | datetime.timedelta
Length of each training set. If
freq
isNone
(default), it represents the number of observations. Otherwise, for integers, it represents the number of periods defined byfreq
; for pandas DateOffset or datetime timedelta it represents the date offset applied to the start of each period.- freqstr | pandas.offsets.DateOffset, optional
If provided, it must be a frequency string or a pandas DateOffset, and the returns
X
must be a DataFrame with an index of typeDatetimeIndex
. For a list of pandas frequencies and offsets, see here. The defaul (None
) meanstest_size
andtrain_size
represent the number of observations.Below are some common examples:
Rebalancing : Montly on the first day
Test Duration : 1 month
Train Duration : 6 months
>>> cv = WalkForward(test_size=1, train_size=6, freq="MS")
Rebalancing : Quarterly on the first day
Test Duration : 1 quarter
Train Duration : 2 months
>>> cv = WalkForward(test_size=1, train_size=pd.DateOffset(months=2), freq="QS")
Rebalancing : Montly on the third Friday
Test Duration : 1 month
Train Duration : 6 weeks
>>> cv = WalkForward(test_size=1, train_size=pd.offsets.Week(6), freq= "WOM-3FRI")
Rebalancing : Semi-annually on the last day
Test Duration : 6 months
Train Duration : 1 year
>>> cv = WalkForward(test_size=1, train_size=2, freq=pd.offsets.SemiMonthEnd())
Rebalancing : Every 2 months on the second day
Test Duration : 2 months
Train Duration : 6 months
>>> cv = WalkForward(test_size=2, train_size=6, freq="MS", freq_offset=dt.timedelta(days=2))
- freq_offsetpandas DateOffset | datetime timedelta, optional
Only used if
freq
is provided. Offsets thefreq
by a pandas DateOffset or a datetime timedelta offset.- previousbool, default=False
Only used if
freq
is provided. If set toTrue
, and if the period start or period end is not in theDatetimeIndex
, the previous observation is used; otherwise, the next observation is used (default).- expend_trainbool, default=False
If set to
True
, each subsequent training set after the first one will use all past observations. The default isFalse
.- reduce_testbool, default=False
If set to
True
, the last train/test split will be returned even if the test set is partial (i.e., it contains fewer observations thantest_size
), otherwise, it will be ignored. The default isFalse
.- purged_sizeint, default=0
The number of observations to exclude from the end of each training set before the test set. The default value is
0
.
Examples
>>> import numpy as np >>> from skfolio.model_selection import WalkForward >>> X = np.random.randn(6, 2) >>> cv = WalkForward(test_size=1, train_size=2) >>> for i, (train_index, test_index) in enumerate(cv.split(X)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") Fold 0: Train: index=[0 1] Test: index=[2] Fold 1: Train: index=[1 2] Test: index=[3] Fold 2: Train: index=[2 3] Test: index=[4] Fold 3: Train: index=[3 4] Test: index=[5] >>> cv = WalkForward(test_size=1, train_size=2, purged_size=1) >>> for i, (train_index, test_index) in enumerate(cv.split(X)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") Fold 0: Train: index=[0 1] Test: index=[3] Fold 1: Train: index=[1 2] Test: index=[4] Fold 2: Train: index=[2 3] Test: index=[5] >>> cv = WalkForward(test_size=2, train_size=3) >>> for i, (train_index, test_index) in enumerate(cv.split(X)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") Fold 0: Train: index=[0 1 2] Test: index=[3 4] >>> cv = WalkForward(test_size=2, train_size=3, reduce_test=True) >>> for i, (train_index, test_index) in enumerate(cv.split(X)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") Fold 0: Train: index=[0 1 2] Test: index=[3 4] Fold 1: Train: index=[2 3 4] Test: index=[5] >>> cv = WalkForward(test_size=2, train_size=3, expend_train=True, reduce_test=True) >>> for i, (train_index, test_index) in enumerate(cv.split(X)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") Fold 0: Train: index=[0 1 2] Test: index=[3 4] Fold 1: Train: index=[0 1 2 3 4] Test: index=[5]
Methods
Get metadata routing of this object.
get_n_splits
([X, y, groups])Returns the number of splitting iterations in the cross-validator
split
(X[, y, groups])Generate indices to split data into training and test set.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_n_splits(X=None, y=None, groups=None)[source]#
Returns the number of splitting iterations in the cross-validator
- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Price returns of the assets.
- yarray-like of shape (n_observations, n_targets)
Always ignored, exists for compatibility.
- groupsarray-like of shape (n_observations,)
Always ignored, exists for compatibility.
- Returns:
- n_foldsint
Returns the number of splitting iterations in the cross-validator.
- split(X, y=None, groups=None)[source]#
Generate indices to split data into training and test set.
- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Price returns of the assets.
- yarray-like of shape (n_observations, n_targets)
Always ignored, exists for compatibility.
- groupsarray-like of shape (n_observations,)
Always ignored, exists for compatibility.
- Yields:
- trainndarray
The training set indices for that split.
- testndarray
The testing set indices for that split.