skfolio.model_selection.WalkForward#

class skfolio.model_selection.WalkForward(test_size, train_size, freq=None, freq_offset=None, previous=False, expend_train=False, reduce_test=False, purged_size=0)[source]#

Walk Forward Cross-Validator.

Provides train/test indices to split time series data samples using a walk-forward logic.

In each split, test indices must be higher than the previous ones; therefore, shuffling in cross-validator is inappropriate.

Compared to sklearn.model_selection.TimeSeriesSplit, you control the train/test folds by specifying the number of training and test samples instead of the number of splits, making it more suitable for portfolio cross-validation.

If your data is a DataFrame indexed with a DatetimeIndex, you can split the data using specific datetime frequencies and offsets.

Parameters:
test_sizeint

Length of each test set. If freq is None (default), it represents the number of observations. Otherwise, it represents the number of periods defined by freq.

train_sizeint | pandas.offsets.DateOffset | datetime.timedelta

Length of each training set. If freq is None (default), it represents the number of observations. Otherwise, for integers, it represents the number of periods defined by freq; for pandas DateOffset or datetime timedelta it represents the date offset applied to the start of each period.

freqstr | pandas.offsets.DateOffset, optional

If provided, it must be a frequency string or a pandas DateOffset, and the returns X must be a DataFrame with an index of type DatetimeIndex. For a list of pandas frequencies and offsets, see here. The defaul (None) means test_size and train_size represent the number of observations.

Below are some common examples:

  • Rebalancing : Montly on the first day

  • Test Duration : 1 month

  • Train Duration : 6 months

>>> cv = WalkForward(test_size=1, train_size=6, freq="MS")
  • Rebalancing : Quarterly on the first day

  • Test Duration : 1 quarter

  • Train Duration : 2 months

>>> cv = WalkForward(test_size=1, train_size=pd.DateOffset(months=2), freq="QS")
  • Rebalancing : Montly on the third Friday

  • Test Duration : 1 month

  • Train Duration : 6 weeks

>>> cv = WalkForward(test_size=1, train_size=pd.offsets.Week(6), freq= "WOM-3FRI")
  • Rebalancing : Semi-annually on the last day

  • Test Duration : 6 months

  • Train Duration : 1 year

>>> cv = WalkForward(test_size=1, train_size=2, freq=pd.offsets.SemiMonthEnd())
  • Rebalancing : Every 2 months on the second day

  • Test Duration : 2 months

  • Train Duration : 6 months

>>> cv = WalkForward(test_size=2, train_size=6, freq="MS", freq_offset=dt.timedelta(days=2))
freq_offsetpandas DateOffset | datetime timedelta, optional

Only used if freq is provided. Offsets the freq by a pandas DateOffset or a datetime timedelta offset.

previousbool, default=False

Only used if freq is provided. If set to True, and if the period start or period end is not in the DatetimeIndex, the previous observation is used; otherwise, the next observation is used (default).

expend_trainbool, default=False

If set to True, each subsequent training set after the first one will use all past observations. The default is False.

reduce_testbool, default=False

If set to True, the last train/test split will be returned even if the test set is partial (i.e., it contains fewer observations than test_size), otherwise, it will be ignored. The default is False.

purged_sizeint, default=0

The number of observations to exclude from the end of each training set before the test set. The default value is 0.

Examples

>>> import numpy as np
>>> from skfolio.model_selection import WalkForward
>>> X = np.random.randn(6, 2)
>>> cv = WalkForward(test_size=1, train_size=2)
>>> for i, (train_index, test_index) in enumerate(cv.split(X)):
...     print(f"Fold {i}:")
...     print(f"  Train: index={train_index}")
...     print(f"  Test:  index={test_index}")
Fold 0:
  Train: index=[0 1]
  Test:  index=[2]
Fold 1:
  Train: index=[1 2]
  Test:  index=[3]
Fold 2:
  Train: index=[2 3]
  Test:  index=[4]
Fold 3:
  Train: index=[3 4]
  Test:  index=[5]
>>> cv = WalkForward(test_size=1, train_size=2, purged_size=1)
>>> for i, (train_index, test_index) in enumerate(cv.split(X)):
...     print(f"Fold {i}:")
...     print(f"  Train: index={train_index}")
...     print(f"  Test:  index={test_index}")
Fold 0:
  Train: index=[0 1]
  Test:  index=[3]
Fold 1:
  Train: index=[1 2]
  Test:  index=[4]
Fold 2:
  Train: index=[2 3]
  Test:  index=[5]
>>> cv = WalkForward(test_size=2, train_size=3)
>>> for i, (train_index, test_index) in enumerate(cv.split(X)):
...     print(f"Fold {i}:")
...     print(f"  Train: index={train_index}")
...     print(f"  Test:  index={test_index}")
Fold 0:
  Train: index=[0 1 2]
  Test:  index=[3 4]
>>> cv = WalkForward(test_size=2, train_size=3, reduce_test=True)
>>> for i, (train_index, test_index) in enumerate(cv.split(X)):
...     print(f"Fold {i}:")
...     print(f"  Train: index={train_index}")
...     print(f"  Test:  index={test_index}")
Fold 0:
  Train: index=[0 1 2]
  Test:  index=[3 4]
Fold 1:
  Train: index=[2 3 4]
  Test:  index=[5]
>>> cv = WalkForward(test_size=2, train_size=3, expend_train=True, reduce_test=True)
>>> for i, (train_index, test_index) in enumerate(cv.split(X)):
...     print(f"Fold {i}:")
...     print(f"  Train: index={train_index}")
...     print(f"  Test:  index={test_index}")
Fold 0:
  Train: index=[0 1 2]
  Test:  index=[3 4]
Fold 1:
  Train: index=[0 1 2 3 4]
  Test:  index=[5]

Methods

get_metadata_routing()

Get metadata routing of this object.

get_n_splits([X, y, groups])

Returns the number of splitting iterations in the cross-validator

split(X[, y, groups])

Generate indices to split data into training and test set.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_n_splits(X=None, y=None, groups=None)[source]#

Returns the number of splitting iterations in the cross-validator

Parameters:
Xarray-like of shape (n_observations, n_assets)

Price returns of the assets.

yarray-like of shape (n_observations, n_targets)

Always ignored, exists for compatibility.

groupsarray-like of shape (n_observations,)

Always ignored, exists for compatibility.

Returns:
n_foldsint

Returns the number of splitting iterations in the cross-validator.

split(X, y=None, groups=None)[source]#

Generate indices to split data into training and test set.

Parameters:
Xarray-like of shape (n_observations, n_assets)

Price returns of the assets.

yarray-like of shape (n_observations, n_targets)

Always ignored, exists for compatibility.

groupsarray-like of shape (n_observations,)

Always ignored, exists for compatibility.

Yields:
trainndarray

The training set indices for that split.

testndarray

The testing set indices for that split.