skfolio.model_selection
.optimal_folds_number#
- skfolio.model_selection.optimal_folds_number(n_observations, target_train_size, target_n_test_paths, weight_train_size=1, weight_n_test_paths=1)[source]#
Find the optimal number of folds (total folds and test folds) for a target training size and a target number of test paths.
We find
x = n_folds
andy = n_test_folds
that minimizes the below cost function of the relative distance from the two targets:\[cost(x,y) = w_{f} \times \lvert\frac{f(x,y)-f_{target}}{f_{target}}\rvert + w_{g} \times \lvert\frac{g(x,y)-g_{target}}{g_{target}}\rvert\]with \(w_{f}\) and \(w_{g}\) the weights assigned to the distance from each target and \(f(x,y)\) and \(g(x,y)\) the average training size and the number of test paths as a function of the number of total folds and test folds.
This is a combinatorial problem with \(\frac{T\times(T-3)}{2}\) combinations, with \(T\) the number of observations.
We reduce the search space by using the combinatorial symetry \({n \choose k}={n \choose n-k}\) and skipping cost computation above 1e5.
- Parameters:
- n_observationsint
Number of observations.
- target_train_sizeint
The target number of observation in the training set.
- target_n_test_pathsint
The target number of test paths (that can be reconstructed from the train/test combinations).
- weight_train_sizefloat, default=1
The weight assigned to the distance from the target train size. The default value is 1.
- weight_n_test_pathsfloat, default=1
The weight assigned to the distance from the target number of test paths. The default value is 1.
- Returns:
- n_foldsint
Optimal number of total folds.
- n_test_foldsint
Optimal number of test folds.