skfolio.optimization.NestedClustersOptimization#

class skfolio.optimization.NestedClustersOptimization(inner_estimator=None, outer_estimator=None, distance_estimator=None, clustering_estimator=None, cv=None, quantile=0.5, quantile_measure=Sharpe Ratio, n_jobs=None, verbose=0, portfolio_params=None, fallback=None, previous_weights=None, raise_on_failure=True)[source]#

Nested Clusters Optimization estimator.

Nested Clusters Optimization (NCO) is a portfolio optimization method developed by Marcos Lopez de Prado.

It uses a distance matrix to compute clusters using a clustering algorithm ( Hierarchical Tree Clustering, KMeans, etc..). For each cluster, the inner-cluster weights are computed by fitting the inner-estimator on each cluster using the whole training data. Then the outer-cluster weights are computed by training the outer-estimator using out-of-sample estimates of the inner-estimators with cross-validation. Finally, the final assets weights are the dot-product of the inner-weights and outer-weights.

Note

The original paper uses KMeans as the clustering algorithm, minimum Variance for the inner-estimator and equal-weighted for the outer-estimator. Here we generalize it to all sklearn and skfolio clustering algorithms (HierarchicalClustering, KMeans, etc.), all portfolio optimizations (Mean-Variance, HRP, etc.) and risk measures (Variance, CVaR, etc.). To avoid data leakage at the outer-estimator, we use out-of-sample estimates to fit the outer estimator.

Parameters:
inner_estimatorBaseOptimization, optional

Optimization estimator used to estimate the inner-weights (also called intra-weights) which are the assets weights inside each cluster. The default None is to use MeanRisk.

outer_estimatorBaseOptimization, optional

Optimization estimator used to estimate the outer-weights (also called inter-weights) which are the weights applied to each cluster. The default None is to use MeanRisk.

distance_estimatorBaseDistance, optional

Distance estimator. The distance estimator is used to estimate the codependence and the distance matrix needed for the computation of the linkage matrix. The default (None) is to use PearsonDistance.

clustering_estimatorBaseEstimator, optional

Clustering estimator. Must expose a labels_ attribute after fitting. The clustering estimator is used to compute the clusters of the assets based on the distance matrix. The default (None) is to use HierarchicalClustering.

Note

Clustering estimators from sklearn are also supported. For example: sklearn.cluster.KMeans.

cvBaseCrossValidator | BaseCombinatorialCV | int | “ignore”, optional

Determines the cross-validation splitting strategy. The default (None) is to use the 5-fold cross validation KFold(). It is applied to the inner-estimators. Its out-of-sample outputs are used to train the outer-estimator. Possible inputs for cv are:

  • “ignore”: no cross-validation is used (note that it will likely lead to data leakage with a high risk of overfitting)

  • Integer, to specify the number of folds in a sklearn.model_selection.KFold

  • An object to be used as a cross-validation generator

  • An iterable yielding train, test splits

  • A CombinatorialPurgedCV

If a CombinatorialCV cross-validator is used, each cluster out-of-sample outputs becomes a collection of multiple paths instead of one single path. The selected out-of-sample path among this collection of paths is chosen according to the quantile and quantile_measure parameters.

n_jobsint, optional

The number of jobs to run in parallel for fit of all estimators. The value -1 means using all processors. The default (None) means 1 unless in a joblib.parallel_backend context.

quantilefloat, default=0.5

Quantile for a given measure (quantile_measure) of the out-of-sample inner-estimator paths when the cv parameter is a CombinatorialPurgedCV cross-validator. The default value is 0.5 corresponding to the path with the median measure. (see cv)

quantile_measurePerfMeasure or RatioMeasure or RiskMeasure or ExtraRiskMeasure, default=RatioMeasure.SHARPE_RATIO

Measure used for the quantile path selection (see quantile and cv). The default is RatioMeasure.SHARPE_RATIO.

verboseint, default=0

The verbosity level. The default value is 0.

portfolio_paramsdict, optional

Portfolio parameters forwarded to the resulting Portfolio in predict. If not provided and if available on the estimator, the following attributes are propagated to the portfolio by default: name and previous_weights.

fallbackBaseOptimization | “previous_weights” | list[BaseOptimization | “previous_weights”], optional

Fallback estimator or a list of estimators to try, in order, when the primary optimization raises during fit. Alternatively, use "previous_weights" (alone or in a list) to fall back to the estimator’s previous_weights. When a fallback succeeds, its fitted weights_ are copied back to the primary estimator so that fit still returns the original instance. For traceability, fallback_ stores the successful estimator (or the string "previous_weights")

and fallback_chain_ stores each attempt with the associated outcome.

previous_weightsfloat | dict[str, float] | array-like of shape (n_assets,), optional

When fallback="previous_weights", failures will fall back to these weights if provided.

raise_on_failurebool, default=True

Controls error handling when fitting fails. If True, any failure during fit is raised immediately, no weights_ are set and subsequent calls to predict will raise a NotFittedError. If False, errors are not raised; instead, a warning is emitted, weights_ is set to None and subsequent calls to predict will return a FailedPortfolio. When fallbacks are specified, this behavior applies only after all fallbacks have been exhausted.

Attributes:
weights_ndarray of shape (n_assets,)

Weights of the assets.

distance_estimator_BaseDistance

Fitted distance_estimator.

inner_estimators_list[BaseOptimization]

List of fitted inner_estimator. One per cluster for clusters containing more than one asset.

outer_estimator_BaseOptimization

Fitted outer_estimator.

clustering_estimator_BaseEstimator

Fitted clustering_estimator.

n_features_in_int

Number of assets seen during fit.

feature_names_in_ndarray of shape (n_features_in_,)

Names of assets seen during fit. Defined only when X has assets names that are all strings.

fallback_BaseOptimization | “previous_weights” | None

The fallback estimator instance, or the string "previous_weights", that produced the final result. None if no fallback was used.

fallback_chain_list[tuple[str, str]] | None

Sequence describing the optimization fallback attempts. Each element is a pair (estimator_repr, outcome) where estimator_repr is the string representation of the primary estimator or a fallback (e.g. "EqualWeighted()", "previous_weights"), and outcome is "success" if that step produced a valid solution, otherwise the stringified error message. For successful fits without any fallback, this is None.

error_str | list[str] | None

Captured error message(s) when fit fails. For multi-portfolio outputs (weights_ is 2D), this is a list aligned with portfolios.

Methods

fit(X[, y])

Fit the Nested Clusters Optimization estimator.

fit_predict(X)

Perform fit on X and returns the predicted Portfolio or Population of Portfolio on X based on the fitted weights.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

predict(X)

Predict the Portfolio or a Population of portfolios on X.

score(X[, y])

Prediction score using the Sharpe Ratio.

set_params(**params)

Set the parameters of this estimator.

Notes

All estimators should specify all parameters as explicit keyword arguments in __init__ (no *args or **kwargs), following scikit-learn conventions.

References

[1]

“Building diversified portfolios that outperform out of sample”, The Journal of Portfolio Management, Marcos López de Prado (2016)

[2]

“A robust estimator of the efficient frontier”, SSRN Electronic Journal, Marcos López de Prado (2019)

[3]

“Machine Learning for Asset Managers”, Elements in Quantitative Finance. Cambridge University Press, Marcos López de Prado (2020)

fit(X, y=None, **fit_params)[source]#

Fit the Nested Clusters Optimization estimator.

Parameters:
Xarray-like of shape (n_observations, n_assets)

Price returns of the assets.

yarray-like of shape (n_observations, n_targets), optional

Price returns of factors or a target benchmark. The default is None.

**fit_paramsdict

Parameters to pass to the underlying estimators. Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:
selfNestedClustersOptimization

Fitted estimator.

fit_predict(X)#

Perform fit on X and returns the predicted Portfolio or Population of Portfolio on X based on the fitted weights. For factor models, use fit(X, y) then predict(X) separately.

If fitting fails and raise_on_failure=False, this returns a FailedPortfolio.

Parameters:
Xarray-like of shape (n_observations, n_assets)

Price returns of the assets.

Returns:
Portfolio | Population

The predicted Portfolio or Population based on the fitted weights.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

property needs_previous_weights#

Whether previous_weights must be propagated between folds/rebalances.

Used by cross_val_predict to decide whether to run sequentially and pass the weights from the previous rebalancing to the next. This is True when transaction costs, a maximum turnover, or a fallback depending on previous_weights are present.

predict(X)#

Predict the Portfolio or a Population of portfolios on X.

Optimization estimators can return a 1D or a 2D array of weights. For a 1D array, the prediction is a single Portfolio. For a 2D array, the prediction is a Population of Portfolio.

If name is not provided in the portfolio parameters, the estimator class name is used.

Parameters:
Xarray-like of shape (n_observations, n_assets) | ReturnDistribution

Asset returns or a ReturnDistribution carrying returns and optional sample weights.

Returns:
Portfolio | Population

The predicted Portfolio or Population based on the fitted weights.

score(X, y=None)#

Prediction score using the Sharpe Ratio. If the prediction is a single Portfolio, the score is its Sharpe Ratio. If the prediction is a Population, the score is the mean Sharpe Ratio across portfolios.

Parameters:
Xarray-like of shape (n_observations, n_assets)

Price returns of the assets.

yIgnored

Not used, present here for API consistency by convention.

Returns:
scorefloat

The Sharpe Ratio of the portfolio if the prediction is a single Portfolio or the mean of all the portfolios Sharpe Ratios if the prediction is a Population of Portfolio.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.