skfolio.model_selection.OnlineRandomizedSearch#

class skfolio.model_selection.OnlineRandomizedSearch(estimator, param_distributions, *, n_iter=10, scoring=None, warmup_size=252, test_size=1, freq=None, freq_offset=None, previous=False, purged_size=0, reduce_test=False, refit=True, random_state=None, error_score=nan, return_predictions=False, portfolio_params=None, n_jobs=None, verbose=0)[source]#

Online randomized search on hyper parameters.

Each sampled parameter combination is evaluated by running a full online walk-forward pass. Unlike OnlineGridSearch, not all parameters are tried out, but a fixed number of parameter settings are sampled from the specified distributions. The number of parameter settings that are tried is given by n_iter.

If all parameters are presented as a list, sampling without replacement is performed. If at least one parameter is given as a distribution, sampling with replacement is used. It is highly recommended to use continuous distributions for continuous parameters.

Parameters:

estimatorBaseEstimator

Estimator that supports partial_fit.

param_distributionsdict or list of dicts

Dictionary with parameters names (str) as keys and distributions or lists of parameters to try. Distributions must provide a rvs method for sampling (such as those from scipy.stats.distributions). If a list is given, it is sampled uniformly. If a list of dicts is given, first a dict is sampled uniformly, and then a parameter is sampled using that dict as above.

n_iterint, default=10

Number of parameter settings that are sampled. n_iter trades off runtime vs quality of the solution.

scoringcallable, dict, BaseMeasure, or None

Scoring specification. Semantics depend on the estimator type:

Component estimators (e.g. covariance, expected returns): None uses estimator.score; otherwise pass a callable scorer(estimator, X_test) or a dict of such callables.
Portfolio optimization estimators: a BaseMeasure or a dict of measures. None defaults to SHARPE_RATIO.

For portfolio optimization estimators, online evaluation scores the aggregated out-of-sample MultiPeriodPortfolio, rather than scoring each test window independently and averaging as in GridSearchCV. Pass the measure enum directly; make_scorer is not supported.

warmup_sizeint, default=252

Number of initial observations (or periods when freq is set) used for the first partial_fit call.

test_sizeint, default=1

Number of observations (or periods when freq is set) per test window.

freqstr | pandas.offsets.BaseOffset, optional

Rebalancing frequency. When provided, warmup_size and test_size are interpreted as period counts rather than observation counts, and X must be a DataFrame with a DatetimeIndex. See WalkForward for details and examples.

freq_offsetpandas.offsets.BaseOffset | datetime.timedelta, optional

Offset applied to the freq boundaries. Only used when freq is provided.

previousbool, default=False

Only used when freq is provided. If True, period boundaries that fall between observations snap to the previous observation; otherwise they snap to the next.

purged_sizeint, default=0

Number of observations (or periods) to skip between the last data the model sees and the start of the test window.

reduce_testbool, default=False

If True, the last test window is included even when it contains fewer observations than test_size.

refitbool, str, or callable, default=True

Controls how the best candidate is selected and whether the selected fitted candidate is exposed as best_estimator_.

This parameter is named for API alignment with scikit-learn. Unlike scikit-learn search estimators, enabling refit does not trigger an additional fit after model selection because each candidate is already evaluated through a full online walk-forward pass and updated through the full sample.

Single-metric scoring: True or False are both supported. If False, best_estimator_ is not stored, but best_index_, best_params_, and best_score_ remain available.
Multi-metric scoring: set to a scorer name to select the best candidate for that metric, or to False to disable best-candidate selection and storage of best_estimator_.
A callable receives cv_results_ and must return the best candidate index.

random_stateint, RandomState instance or None, default=None

Pseudo random number generator state used for random uniform sampling from lists of possible values instead of scipy.stats distributions. Pass an int for reproducible output across multiple function calls.

error_score“raise” or float, default=np.nan

Value to assign to the score if an error occurs during fitting. If set to "raise", the error is raised.

return_predictionsbool, default=False

If True, store MultiPeriodPortfolio objects per candidate in cv_results_["predictions"]. Only applies to portfolio optimization estimators.

portfolio_paramsdict, optional

Parameters forwarded to MultiPeriodPortfolio when scoring portfolio estimators.

n_jobsint or None, default=None

Number of parallel jobs. None means 1.

verboseint, default=0

Verbosity level for joblib.Parallel.

Attributes:

cv_results_dict[str, ndarray]

A dict with keys:

params: list of candidate parameter dicts.
mean_score: array of aggregate scores (or mean_score_<name> for multi-metric).
rank: array of ranks where 1 is best (or rank_<name> for multi-metric).
fit_time: array of wall-clock times.
predictions: object array of MultiPeriodPortfolio or None aligned with candidates (only when return_predictions=True and the estimator is portfolio-based).

best_estimator_BaseEstimator

Estimator fitted on the full data with the best parameters. Only available when refit is not False.

best_score_float

Aggregate score of the selected best candidate. Available when best_index_ is defined and refit is not callable.

best_params_dict

Parameter setting that gave the selected best score. Available when best_index_ is defined.

best_index_int

Index into cv_results_ of the best candidate. Available for single-metric scoring and for multi-metric scoring when refit is not False.

multimetric_bool

Whether or not the scorers compute several metrics.

is_portfolio_estimator_bool

Whether or not the estimator is a portfolio optimization estimator.

Methods

`fit`(X[, y])	Run the online search over all candidate parameter combinations.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`predict`(X)	Predict using the best estimator found during search.
`score`(X[, y])	Score using the best estimator found during search.
`set_params`(**params)	Set the parameters of this estimator.

See also

Online Covariance Hyperparameter Tuning: Randomized online tuning of covariance estimator hyperparameters.

Examples

>>> from scipy.stats import uniform
>>> from skfolio.datasets import load_sp500_dataset
>>> from skfolio.model_selection import OnlineRandomizedSearch
>>> from skfolio.moments import EWCovariance, EWMu
>>> from skfolio.optimization import MeanRisk
>>> from skfolio.preprocessing import prices_to_returns
>>> from skfolio.prior import EmpiricalPrior
>>>
>>> prices = load_sp500_dataset()
>>> X = prices_to_returns(prices)
>>>
>>> model = MeanRisk(
...     prior_estimator=EmpiricalPrior(
...         mu_estimator=EWMu(),
...         covariance_estimator=EWCovariance(),
...     ),
... )
>>> search = OnlineRandomizedSearch(
...     model,
...     param_distributions={
...         "prior_estimator__mu_estimator__half_life": uniform(10, 90),
...         "prior_estimator__covariance_estimator__half_life": uniform(10, 90),
...     },
...     n_iter=20,
...     warmup_size=252,
...     test_size=5,
...     n_jobs=-1,
...     random_state=42,
... )
>>> search.fit(X)
>>> search.best_params_
>>> search.best_estimator_

fit(X, y=None, **fit_params)#

Run the online search over all candidate parameter combinations.

Parameters:

Xarray-like of shape (n_observations, n_assets): Price returns.
yarray-like, optional: Optional Target.
**fit_params: Additional parameters routed via metadata routing.

Returns:

self

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(X)#

Predict using the best estimator found during search.

Parameters:

Xarray-like of shape (n_observations, n_assets): Price returns.

Returns:

predictionPortfolio | Population

score(X, y=None)#

Score using the best estimator found during search.

Parameters:

Xarray-like of shape (n_observations, n_assets): Price returns.
yIgnored: Present for scikit-learn API compatibility.

Returns:

scorefloat

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.