skfolio.optimization.NestedClustersOptimization#

class skfolio.optimization.NestedClustersOptimization(inner_estimator=None, outer_estimator=None, distance_estimator=None, clustering_estimator=None, cv=None, quantile=0.5, quantile_measure=Sharpe Ratio, n_jobs=None, verbose=0, portfolio_params=None)[source]#

Nested Clusters Optimization estimator.

Nested Clusters Optimization (NCO) is a portfolio optimization method developed by Marcos Lopez de Prado.

It uses a distance matrix to compute clusters using a clustering algorithm ( Hierarchical Tree Clustering, KMeans, etc..). For each cluster, the inner-cluster weights are computed by fitting the inner-estimator on each cluster using the whole training data. Then the outer-cluster weights are computed by training the outer-estimator using out-of-sample estimates of the inner-estimators with cross-validation. Finally, the final assets weights are the dot-product of the inner-weights and outer-weights.

Note

The original paper uses KMeans as the clustering algorithm, minimum Variance for the inner-estimator and equal-weighted for the outer-estimator. Here we generalize it to all sklearn and skfolio clustering algorithms (HierarchicalClustering, KMeans, etc.), all portfolio optimizations (Mean-Variance, HRP, etc.) and risk measures (Variance, CVaR, etc.). To avoid data leakage at the outer-estimator, we use out-of-sample estimates to fit the outer estimator.

Parameters:

inner_estimatorBaseOptimization, optional

Optimization estimator used to estimate the inner-weights (also called intra-weights) which are the assets weights inside each cluster. The default None is to use MeanRisk.

outer_estimatorBaseOptimization, optional

Optimization estimator used to estimate the outer-weights (also called inter-weights) which are the weights applied to each cluster. The default None is to use MeanRisk.

distance_estimatorBaseDistance, optional

Distance estimator. The distance estimator is used to estimate the codependence and the distance matrix needed for the computation of the linkage matrix. The default (None) is to use PearsonDistance.

clustering_estimatorBaseEstimator, optional

Clustering estimator. Must expose a labels_ attribute after fitting. The clustering estimator is used to compute the clusters of the assets based on the distance matrix. The default (None) is to use HierarchicalClustering.

Note

Clustering estimators from sklearn are also supported. For example: sklearn.cluster.KMeans.

cvBaseCrossValidator | BaseCombinatorialCV | int | “ignore”, optional

Determines the cross-validation splitting strategy. The default (None) is to use the 5-fold cross validation KFold(). It is applied to the inner-estimators. Its out-of-sample outputs are used to train the outer-estimator. Possible inputs for cv are:

“ignore”: no cross-validation is used (note that it will likely lead to data leakage with a high risk of overfitting)

Integer, to specify the number of folds in a sklearn.model_selection.KFold

An object to be used as a cross-validation generator

An iterable yielding train, test splits

A CombinatorialPurgedCV

If a CombinatorialCV cross-validator is used, each cluster out-of-sample outputs becomes a collection of multiple paths instead of one single path. The selected out-of-sample path among this collection of paths is chosen according to the quantile and quantile_measure parameters.

n_jobsint, optional

The number of jobs to run in parallel for fit of all estimators. The value -1 means using all processors. The default (None) means 1 unless in a joblib.parallel_backend context.

quantilefloat, default=0.5

Quantile for a given measure (quantile_measure) of the out-of-sample inner-estimator paths when the cv parameter is a CombinatorialPurgedCV cross-validator. The default value is 0.5 corresponding to the path with the median measure. (see cv)

quantile_measurePerfMeasure or RatioMeasure or RiskMeasure or ExtraRiskMeasure, default=RatioMeasure.SHARPE_RATIO

Measure used for the quantile path selection (see quantile and cv). The default is RatioMeasure.SHARPE_RATIO.

verboseint, default=0

The verbosity level. The default value is 0.

portfolio_paramsdict, optional

Portfolio parameters passed to the portfolio evaluated by the predict and score methods. If not provided, the name is copied from the optimization model and systematically passed to the portfolio.

Attributes:

weights_ndarray of shape (n_assets,): Weights of the assets.
distance_estimator_BaseDistance: Fitted distance_estimator.
inner_estimators_list[BaseOptimization]: List of fitted inner_estimator. One per cluster for clusters containing more than one asset.
outer_estimator_BaseOptimization: Fitted outer_estimator.
clustering_estimator_BaseEstimator: Fitted clustering_estimator.
n_features_in_int: Number of assets seen during fit.
feature_names_in_ndarray of shape (n_features_in_,): Names of assets seen during fit. Defined only when X has assets names that are all strings.

Methods

`fit`(X[, y])	Fit the Nested Clusters Optimization estimator.
`fit_predict`(X)	Perform `fit` on `X` and returns the predicted `Portfolio` or `Population` of `Portfolio` on `X` based on the fitted `weights`.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`predict`(X)	Predict the `Portfolio` or `Population` of `Portfolio` on `X` based on the fitted weights.
`score`(X[, y])	Prediction score.
`set_params`(**params)	Set the parameters of this estimator.

References

[1]

“Building diversified portfolios that outperform out of sample”, The Journal of Portfolio Management, Marcos López de Prado (2016)

[2]

“A robust estimator of the efficient frontier”, SSRN Electronic Journal, Marcos López de Prado (2019)

[3]

“Machine Learning for Asset Managers”, Elements in Quantitative Finance. Cambridge University Press, Marcos López de Prado (2020)

fit(X, y=None, **fit_params)[source]#

Fit the Nested Clusters Optimization estimator.

Parameters:

Xarray-like of shape (n_observations, n_assets): Price returns of the assets.
yarray-like of shape (n_observations, n_targets), optional: Price returns of factors or a target benchmark. The default is None.
**fit_paramsdict: Parameters to pass to the underlying estimators. Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:

selfNestedClustersOptimization: Fitted estimator.

fit_predict(X)#

Perform fit on X and returns the predicted Portfolio or Population of Portfolio on X based on the fitted weights. For factor models, use fit(X, y) then predict(X) separately.

Parameters:

Xarray-like of shape (n_observations, n_assets): Price returns of the assets.

Returns:

predictionPortfolio | Population: Portfolio or Population of Portfolio estimated on X based on the fitted weights.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(X)#

Predict the Portfolio or Population of Portfolio on X based on the fitted weights.

Optimization estimators can return a 1D or a 2D array of weights. For a 1D array, the prediction returns a Portfolio. For a 2D array, the prediction returns a Population of Portfolio.

If name is not provided in the portfolio arguments, we use the first 500 characters of the estimator name.

Parameters:

Xarray-like of shape (n_observations, n_assets): Price returns of the assets.

Returns:

predictionPortfolio | Population: Portfolio or Population of Portfolio estimated on X based on the fitted weights.

score(X, y=None)#

Prediction score. If the prediction is a single Portfolio, the score is the Sharpe Ratio. If the prediction is a Population of Portfolio, the score is the mean of all the portfolios Sharpe Ratios in the population.

Parameters:

Xarray-like of shape (n_observations, n_assets): Price returns of the assets.
yIgnored: Not used, present here for API consistency by convention.

Returns:

scorefloat: The Sharpe Ratio of the portfolio if the prediction is a single Portfolio or the mean of all the portfolios Sharpe Ratios if the prediction is a Population of Portfolio.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.