
class skfolio.model_selection.WalkForward(test_size, train_size, freq=None, freq_offset=None, previous=False, expend_train=False, reduce_test=False, purged_size=0)[source]#

Walk Forward Cross-Validator.

Provides train/test indices to split time series data samples using a walk-forward logic.

In each split, test indices must be higher than the previous ones; therefore, shuffling in cross-validator is inappropriate.

Compared to sklearn.model_selection.TimeSeriesSplit, you control the train/test folds by specifying the number of training and test samples instead of the number of splits, making it more suitable for portfolio cross-validation.

If your data is a DataFrame indexed with a DatetimeIndex, you can split the data using specific datetime frequencies and offsets.


Length of each test set. If freq is None (default), it represents the number of observations. Otherwise, it represents the number of periods defined by freq.

train_sizeint | pandas.offsets.DateOffset | datetime.timedelta

Length of each training set. If freq is None (default), it represents the number of observations. Otherwise, for integers, it represents the number of periods defined by freq; for pandas DateOffset or datetime timedelta it represents the date offset applied to the start of each period.

freqstr | pandas.offsets.DateOffset, optional

If provided, it must be a frequency string or a pandas DateOffset, and the returns X must be a DataFrame with an index of type DatetimeIndex. For a list of pandas frequencies and offsets, see here. The defaul (None) means test_size and train_size represent the number of observations.

Below are some common examples:

  • Rebalancing : Montly on the first day

  • Test Duration : 1 month

  • Train Duration : 6 months

>>> cv = WalkForward(test_size=1, train_size=6, freq="MS")
  • Rebalancing : Quarterly on the first day

  • Test Duration : 1 quarter

  • Train Duration : 2 months

>>> cv = WalkForward(test_size=1, train_size=pd.DateOffset(months=2), freq="QS")
  • Rebalancing : Montly on the third Friday

  • Test Duration : 1 month

  • Train Duration : 6 weeks

>>> cv = WalkForward(test_size=1, train_size=pd.offsets.Week(6), freq= "WOM-3FRI")
  • Rebalancing : Semi-annually on the last day

  • Test Duration : 6 months

  • Train Duration : 1 year

>>> cv = WalkForward(test_size=1, train_size=2, freq=pd.offsets.SemiMonthEnd())
  • Rebalancing : Every 2 months on the second day

  • Test Duration : 2 months

  • Train Duration : 6 months

>>> cv = WalkForward(test_size=2, train_size=6, freq="MS", freq_offset=dt.timedelta(days=2))
freq_offsetpandas DateOffset | datetime timedelta, optional

Only used if freq is provided. Offsets the freq by a pandas DateOffset or a datetime timedelta offset.

previousbool, default=False

Only used if freq is provided. If set to True, and if the period start or period end is not in the DatetimeIndex, the previous observation is used; otherwise, the next observation is used (default).

expend_trainbool, default=False

If set to True, each subsequent training set after the first one will use all past observations. The default is False.

reduce_testbool, default=False

If set to True, the last train/test split will be returned even if the test set is partial (i.e., it contains fewer observations than test_size), otherwise, it will be ignored. The default is False.

purged_sizeint, default=0

The number of observations to exclude from the end of each training set before the test set. The default value is 0.


>>> import numpy as np
>>> from skfolio.model_selection import WalkForward
>>> X = np.random.randn(6, 2)
>>> cv = WalkForward(test_size=1, train_size=2)
>>> for i, (train_index, test_index) in enumerate(cv.split(X)):
...     print(f"Fold {i}:")
...     print(f"  Train: index={train_index}")
...     print(f"  Test:  index={test_index}")
Fold 0:
  Train: index=[0 1]
  Test:  index=[2]
Fold 1:
  Train: index=[1 2]
  Test:  index=[3]
Fold 2:
  Train: index=[2 3]
  Test:  index=[4]
Fold 3:
  Train: index=[3 4]
  Test:  index=[5]
>>> cv = WalkForward(test_size=1, train_size=2, purged_size=1)
>>> for i, (train_index, test_index) in enumerate(cv.split(X)):
...     print(f"Fold {i}:")
...     print(f"  Train: index={train_index}")
...     print(f"  Test:  index={test_index}")
Fold 0:
  Train: index=[0 1]
  Test:  index=[3]
Fold 1:
  Train: index=[1 2]
  Test:  index=[4]
Fold 2:
  Train: index=[2 3]
  Test:  index=[5]
>>> cv = WalkForward(test_size=2, train_size=3)
>>> for i, (train_index, test_index) in enumerate(cv.split(X)):
...     print(f"Fold {i}:")
...     print(f"  Train: index={train_index}")
...     print(f"  Test:  index={test_index}")
Fold 0:
  Train: index=[0 1 2]
  Test:  index=[3 4]
>>> cv = WalkForward(test_size=2, train_size=3, reduce_test=True)
>>> for i, (train_index, test_index) in enumerate(cv.split(X)):
...     print(f"Fold {i}:")
...     print(f"  Train: index={train_index}")
...     print(f"  Test:  index={test_index}")
Fold 0:
  Train: index=[0 1 2]
  Test:  index=[3 4]
Fold 1:
  Train: index=[2 3 4]
  Test:  index=[5]
>>> cv = WalkForward(test_size=2, train_size=3, expend_train=True, reduce_test=True)
>>> for i, (train_index, test_index) in enumerate(cv.split(X)):
...     print(f"Fold {i}:")
...     print(f"  Train: index={train_index}")
...     print(f"  Test:  index={test_index}")
Fold 0:
  Train: index=[0 1 2]
  Test:  index=[3 4]
Fold 1:
  Train: index=[0 1 2 3 4]
  Test:  index=[5]



Get metadata routing of this object.

get_n_splits([X, y, groups])

Return the number of splitting iterations in the cross-validator.

split(X[, y, groups])

Generate indices to split data into training and test set.


Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.


A MetadataRequest encapsulating routing information.

get_n_splits(X=None, y=None, groups=None)[source]#

Return the number of splitting iterations in the cross-validator.

Xarray-like of shape (n_observations, n_assets)

Price returns of the assets.

yarray-like of shape (n_observations, n_targets)

Always ignored, exists for compatibility.

groupsarray-like of shape (n_observations,)

Always ignored, exists for compatibility.


Returns the number of splitting iterations in the cross-validator.

split(X, y=None, groups=None)[source]#

Generate indices to split data into training and test set.

Xarray-like of shape (n_observations, n_assets)

Price returns of the assets.

yarray-like of shape (n_observations, n_targets)

Always ignored, exists for compatibility.

groupsarray-like of shape (n_observations,)

Always ignored, exists for compatibility.


The training set indices for that split.


The testing set indices for that split.