skfolio.model_selection.optimal_folds_number#

skfolio.model_selection.optimal_folds_number(n_observations, target_train_size, target_n_test_paths, weight_train_size=1, weight_n_test_paths=1)[source]#

Find the optimal number of folds (total folds and test folds) for a target training size and a target number of test paths.

We find x = n_folds and y = n_test_folds that minimizes the below cost function of the relative distance from the two targets:

\[cost(x,y) = w_{f} \times \lvert\frac{f(x,y)-f_{target}}{f_{target}}\rvert + w_{g} \times \lvert\frac{g(x,y)-g_{target}}{g_{target}}\rvert\]

with \(w_{f}\) and \(w_{g}\) the weights assigned to the distance from each target and \(f(x,y)\) and \(g(x,y)\) the average training size and the number of test paths as a function of the number of total folds and test folds.

This is a combinatorial problem with \(\frac{T\times(T-3)}{2}\) combinations, with \(T\) the number of observations.

We reduce the search space by using the combinatorial symetry \({n \choose k}={n \choose n-k}\) and skipping cost computation above 1e5.

Parameters:
n_observationsint

Number of observations.

target_train_sizeint

The target number of observation in the training set.

target_n_test_pathsint

The target number of test paths (that can be reconstructed from the train/test combinations).

weight_train_sizefloat, default=1

The weight assigned to the distance from the target train size. The default value is 1.

weight_n_test_pathsfloat, default=1

The weight assigned to the distance from the target number of test paths. The default value is 1.

Returns:
n_foldsint

Optimal number of total folds.

n_test_foldsint

Optimal number of test folds.