skfolio.model_selection.cross_val_predict#

skfolio.model_selection.cross_val_predict(estimator, X, y=None, cv=None, n_jobs=None, method='predict', verbose=0, params=None, pre_dispatch='2*n_jobs', column_indices=None, portfolio_params=None)[source]#

Generate cross-validated Portfolios estimates.

The data is split according to the cv parameter. The optimization estimator is fitted on the training set and portfolios are predicted on the corresponding test set.

For non-combinatorial cross-validation like Kfold, the output is the predicted MultiPeriodPortfolio where each Portfolio corresponds to the prediction on each train/test pair (k portfolios for Kfold).

For combinatorial cross-validation like CombinatorialPurgedCV, the output is the predicted Population of multiple MultiPeriodPortfolio (each test outputs are a collection of multiple paths instead of one single path).

Parameters:
estimatorBaseOptimization

Optimization estimators use to fit the data.

Xarray-like of shape (n_observations, n_assets)

Price returns of the assets.

yarray-like of shape (n_observations, n_targets), optional

Target data (optional). For example, the price returns of the factors.

cvint | cross-validation generator, optional

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 5-fold cross validation,

  • int, to specify the number of folds in a (Stratified)KFold,

  • CV splitter,

  • An iterable that generates (train, test) splits as arrays of indices.

n_jobsint, optional

The number of jobs to run in parallel for fit of all estimators. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

methodstr

Invokes the passed method name of the passed estimator.

verboseint, default=0

The verbosity level.

paramsdict, optional

Parameters to pass to the underlying estimator’s fit and the CV splitter.

pre_dispatchint or str, default=’2*n_jobs’

Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be:

  • None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs

  • An int, giving the exact number of total jobs that are spawned

  • A str, giving an expression as a function of n_jobs, as in ‘2*n_jobs’

column_indicesndarray, optional

Indices of the X columns to cross-validate on.

portfolio_paramsdict, optional

Additional portfolio parameters passed to MultiPeriodPortfolio.

Returns:
predictionsMultiPeriodPortfolio | Population

This is the result of calling predict