skfolio.optimization.NestedClustersOptimization#
- class skfolio.optimization.NestedClustersOptimization(inner_estimator=None, outer_estimator=None, distance_estimator=None, clustering_estimator=None, cv=None, quantile=0.5, quantile_measure=Sharpe Ratio, n_jobs=None, verbose=0, portfolio_params=None, fallback=None, previous_weights=None, raise_on_failure=True)[source]#
Nested Clusters Optimization estimator.
Nested Clusters Optimization (NCO) is a portfolio optimization method developed by Marcos Lopez de Prado.
It uses a distance matrix to compute clusters using a clustering algorithm ( Hierarchical Tree Clustering, KMeans, etc..). For each cluster, the inner-cluster weights are computed by fitting the inner-estimator on each cluster using the whole training data. Then the outer-cluster weights are computed by training the outer-estimator using out-of-sample estimates of the inner-estimators with cross-validation. Finally, the final assets weights are the dot-product of the inner-weights and outer-weights.
Note
The original paper uses KMeans as the clustering algorithm, minimum Variance for the inner-estimator and equal-weighted for the outer-estimator. Here we generalize it to all
sklearnandskfolioclustering algorithms (HierarchicalClustering, KMeans, etc.), all portfolio optimizations (Mean-Variance, HRP, etc.) and risk measures (Variance, CVaR, etc.). To avoid data leakage at the outer-estimator, we use out-of-sample estimates to fit the outer estimator.- Parameters:
- inner_estimatorBaseOptimization, optional
Optimization estimator used to estimate the inner-weights (also called intra-weights) which are the assets weights inside each cluster. The default
Noneis to useMeanRisk.- outer_estimatorBaseOptimization, optional
Optimization estimator used to estimate the outer-weights (also called inter-weights) which are the weights applied to each cluster. The default
Noneis to useMeanRisk.- distance_estimatorBaseDistance, optional
Distance estimator. The distance estimator is used to estimate the codependence and the distance matrix needed for the computation of the linkage matrix. The default (
None) is to usePearsonDistance.- clustering_estimatorBaseEstimator, optional
Clustering estimator. Must expose a
labels_attribute after fitting. The clustering estimator is used to compute the clusters of the assets based on the distance matrix. The default (None) is to useHierarchicalClustering.Note
Clustering estimators from
sklearnare also supported. For example:sklearn.cluster.KMeans.- cvBaseCrossValidator | BaseCombinatorialCV | int | “ignore”, optional
Determines the cross-validation splitting strategy. The default (
None) is to use the 5-fold cross validationKFold(). It is applied to the inner-estimators. Its out-of-sample outputs are used to train the outer-estimator. Possible inputs forcvare:“ignore”: no cross-validation is used (note that it will likely lead to data leakage with a high risk of overfitting)
Integer, to specify the number of folds in a
sklearn.model_selection.KFoldAn object to be used as a cross-validation generator
An iterable yielding train, test splits
If a
CombinatorialCVcross-validator is used, each cluster out-of-sample outputs becomes a collection of multiple paths instead of one single path. The selected out-of-sample path among this collection of paths is chosen according to thequantileandquantile_measureparameters.- n_jobsint, optional
The number of jobs to run in parallel for
fitof allestimators. The value-1means using all processors. The default (None) means 1 unless in ajoblib.parallel_backendcontext.- quantilefloat, default=0.5
Quantile for a given measure (
quantile_measure) of the out-of-sample inner-estimator paths when thecvparameter is aCombinatorialPurgedCVcross-validator. The default value is0.5corresponding to the path with the median measure. (seecv)- quantile_measurePerfMeasure or RatioMeasure or RiskMeasure or ExtraRiskMeasure, default=RatioMeasure.SHARPE_RATIO
Measure used for the quantile path selection (see
quantileandcv). The default isRatioMeasure.SHARPE_RATIO.- verboseint, default=0
The verbosity level. The default value is
0.- portfolio_paramsdict, optional
Portfolio parameters forwarded to the resulting
Portfolioinpredict. If not provided and if available on the estimator, the following attributes are propagated to the portfolio by default:nameandprevious_weights.- fallbackBaseOptimization | “previous_weights” | list[BaseOptimization | “previous_weights”], optional
Fallback estimator or a list of estimators to try, in order, when the primary optimization raises during
fit. Alternatively, use"previous_weights"(alone or in a list) to fall back to the estimator’sprevious_weights. When a fallback succeeds, its fittedweights_are copied back to the primary estimator so thatfitstill returns the original instance. For traceability,fallback_stores the successful estimator (or the string"previous_weights")and
fallback_chain_stores each attempt with the associated outcome.- previous_weightsfloat | dict[str, float] | array-like of shape (n_assets,), optional
When
fallback="previous_weights", failures will fall back to these weights if provided.- raise_on_failurebool, default=True
Controls error handling when fitting fails. If True, any failure during
fitis raised immediately, noweights_are set and subsequent calls topredictwill raise aNotFittedError. If False, errors are not raised; instead, a warning is emitted,weights_is set toNoneand subsequent calls topredictwill return aFailedPortfolio. When fallbacks are specified, this behavior applies only after all fallbacks have been exhausted.
- Attributes:
- weights_ndarray of shape (n_assets,)
Weights of the assets.
- distance_estimator_BaseDistance
Fitted
distance_estimator.- inner_estimators_list[BaseOptimization]
List of fitted
inner_estimator. One per cluster for clusters containing more than one asset.- outer_estimator_BaseOptimization
Fitted
outer_estimator.- clustering_estimator_BaseEstimator
Fitted
clustering_estimator.- n_features_in_int
Number of assets seen during
fit.- feature_names_in_ndarray of shape (
n_features_in_,) Names of assets seen during
fit. Defined only whenXhas assets names that are all strings.- fallback_BaseOptimization | “previous_weights” | None
The fallback estimator instance, or the string
"previous_weights", that produced the final result.Noneif no fallback was used.- fallback_chain_list[tuple[str, str]] | None
Sequence describing the optimization fallback attempts. Each element is a pair
(estimator_repr, outcome)whereestimator_repris the string representation of the primary estimator or a fallback (e.g."EqualWeighted()","previous_weights"), andoutcomeis"success"if that step produced a valid solution, otherwise the stringified error message. For successful fits without any fallback, this isNone.- error_str | list[str] | None
Captured error message(s) when
fitfails. For multi-portfolio outputs (weights_is 2D), this is a list aligned with portfolios.
Methods
fit(X[, y])Fit the Nested Clusters Optimization estimator.
fit_predict(X)Perform
fitonXand returns the predictedPortfolioorPopulationofPortfolioonXbased on the fittedweights.Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
predict(X)Predict the
Portfolioor aPopulationof portfolios onX.score(X[, y])Prediction score using the Sharpe Ratio.
set_params(**params)Set the parameters of this estimator.
Notes
All estimators should specify all parameters as explicit keyword arguments in
__init__(no*argsor**kwargs), following scikit-learn conventions.References
[1]“Building diversified portfolios that outperform out of sample”, The Journal of Portfolio Management, Marcos López de Prado (2016)
[2]“A robust estimator of the efficient frontier”, SSRN Electronic Journal, Marcos López de Prado (2019)
[3]“Machine Learning for Asset Managers”, Elements in Quantitative Finance. Cambridge University Press, Marcos López de Prado (2020)
- fit(X, y=None, **fit_params)[source]#
Fit the Nested Clusters Optimization estimator.
- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Price returns of the assets.
- yarray-like of shape (n_observations, n_targets), optional
Price returns of factors or a target benchmark. The default is
None.- **fit_paramsdict
Parameters to pass to the underlying estimators. Only available if
enable_metadata_routing=True, which can be set by usingsklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.
- Returns:
- selfNestedClustersOptimization
Fitted estimator.
- fit_predict(X)#
Perform
fitonXand returns the predictedPortfolioorPopulationofPortfolioonXbased on the fittedweights. For factor models, usefit(X, y)thenpredict(X)separately.If fitting fails and
raise_on_failure=False, this returns aFailedPortfolio.- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Price returns of the assets.
- Returns:
- Portfolio | Population
The predicted
PortfolioorPopulationbased on the fittedweights.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- property needs_previous_weights#
Whether
previous_weightsmust be propagated between folds/rebalances.Used by
cross_val_predictto decide whether to run sequentially and pass the weights from the previous rebalancing to the next. This isTruewhen transaction costs, a maximum turnover, or a fallback depending onprevious_weightsare present.
- predict(X)#
Predict the
Portfolioor aPopulationof portfolios onX.Optimization estimators can return a 1D or a 2D array of
weights. For a 1D array, the prediction is a singlePortfolio. For a 2D array, the prediction is aPopulationofPortfolio.If
nameis not provided in the portfolio parameters, the estimator class name is used.- Parameters:
- Xarray-like of shape (n_observations, n_assets) | ReturnDistribution
Asset returns or a
ReturnDistributioncarrying returns and optional sample weights.
- Returns:
- Portfolio | Population
The predicted
PortfolioorPopulationbased on the fittedweights.
- score(X, y=None)#
Prediction score using the Sharpe Ratio. If the prediction is a single
Portfolio, the score is its Sharpe Ratio. If the prediction is aPopulation, the score is the mean Sharpe Ratio across portfolios.- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Price returns of the assets.
- yIgnored
Not used, present here for API consistency by convention.
- Returns:
- scorefloat
The Sharpe Ratio of the portfolio if the prediction is a single
Portfolioor the mean of all the portfolios Sharpe Ratios if the prediction is aPopulationofPortfolio.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.