skfolio.optimization
.NestedClustersOptimization#
- class skfolio.optimization.NestedClustersOptimization(inner_estimator=None, outer_estimator=None, distance_estimator=None, clustering_estimator=None, cv=None, quantile=0.5, quantile_measure=Sharpe Ratio, n_jobs=None, verbose=0, portfolio_params=None)[source]#
Nested Clusters Optimization estimator.
Nested Clusters Optimization (NCO) is a portfolio optimization method developed by Marcos Lopez de Prado.
It uses a distance matrix to compute clusters using a clustering algorithm ( Hierarchical Tree Clustering, KMeans, etc..). For each cluster, the inner-cluster weights are computed by fitting the inner-estimator on each cluster using the whole training data. Then the outer-cluster weights are computed by training the outer-estimator using out-of-sample estimates of the inner-estimators with cross-validation. Finally, the final assets weights are the dot-product of the inner-weights and outer-weights.
Note
The original paper uses KMeans as the clustering algorithm, minimum Variance for the inner-estimator and equal-weighted for the outer-estimator. Here we generalize it to all
sklearn
andskfolio
clustering algorithms (HierarchicalClustering, KMeans, etc.), all portfolio optimizations (Mean-Variance, HRP, etc.) and risk measures (Variance, CVaR, etc.). To avoid data leakage at the outer-estimator, we use out-of-sample estimates to fit the outer estimator.- Parameters:
- inner_estimatorBaseOptimization, optional
Optimization estimator used to estimate the inner-weights (also called intra-weights) which are the assets weights inside each cluster. The default
None
is to useMeanRisk
.- outer_estimatorBaseOptimization, optional
Optimization estimator used to estimate the outer-weights (also called inter-weights) which are the weights applied to each cluster. The default
None
is to useMeanRisk
.- distance_estimatorBaseDistance, optional
Distance estimator. The distance estimator is used to estimate the codependence and the distance matrix needed for the computation of the linkage matrix. The default (
None
) is to usePearsonDistance
.- clustering_estimatorBaseEstimator, optional
Clustering estimator. Must expose a
labels_
attribute after fitting. The clustering estimator is used to compute the clusters of the assets based on the distance matrix. The default (None
) is to useHierarchicalClustering
.Note
Clustering estimators from
sklearn
are also supported. For example:sklearn.cluster.KMeans
.- cvBaseCrossValidator | BaseCombinatorialCV | int | “ignore”, optional
Determines the cross-validation splitting strategy. The default (
None
) is to use the 5-fold cross validationKFold()
. It is applied to the inner-estimators. Its out-of-sample outputs are used to train the outer-estimator. Possible inputs forcv
are:“ignore”: no cross-validation is used (note that it will likely lead to data leakage with a high risk of overfitting)
Integer, to specify the number of folds in a
sklearn.model_selection.KFold
An object to be used as a cross-validation generator
An iterable yielding train, test splits
If a
CombinatorialCV
cross-validator is used, each cluster out-of-sample outputs becomes a collection of multiple paths instead of one single path. The selected out-of-sample path among this collection of paths is chosen according to thequantile
andquantile_measure
parameters.- n_jobsint, optional
The number of jobs to run in parallel for
fit
of allestimators
. The value-1
means using all processors. The default (None
) means 1 unless in ajoblib.parallel_backend
context.- quantilefloat, default=0.5
Quantile for a given measure (
quantile_measure
) of the out-of-sample inner-estimator paths when thecv
parameter is aCombinatorialPurgedCV
cross-validator. The default value is0.5
corresponding to the path with the median measure. (seecv
)- quantile_measurePerfMeasure or RatioMeasure or RiskMeasure or ExtraRiskMeasure, default=RatioMeasure.SHARPE_RATIO
Measure used for the quantile path selection (see
quantile
andcv
). The default isRatioMeasure.SHARPE_RATIO
.- verboseint, default=0
The verbosity level. The default value is
0
.- portfolio_paramsdict, optional
Portfolio parameters passed to the portfolio evaluated by the
predict
andscore
methods. If not provided, thename
is copied from the optimization model and systematically passed to the portfolio.
- Attributes:
- weights_ndarray of shape (n_assets,)
Weights of the assets.
- distance_estimator_BaseDistance
Fitted
distance_estimator
.- inner_estimators_list[BaseOptimization]
List of fitted
inner_estimator
. One per cluster for clusters containing more than one asset.- outer_estimator_BaseOptimization
Fitted
outer_estimator
.- clustering_estimator_BaseEstimator
Fitted
clustering_estimator
.- n_features_in_int
Number of assets seen during
fit
.- feature_names_in_ndarray of shape (
n_features_in_
,) Names of assets seen during
fit
. Defined only whenX
has assets names that are all strings.
References
[1]“Building diversified portfolios that outperform out of sample”, The Journal of Portfolio Management, Marcos López de Prado (2016)
[2]“A robust estimator of the efficient frontier”, SSRN Electronic Journal, Marcos López de Prado (2019)
[3]“Machine Learning for Asset Managers”, Elements in Quantitative Finance. Cambridge University Press, Marcos López de Prado (2020)
Methods
fit
(X[, y])Fit the Nested Clusters Optimization estimator.
fit_predict
(X)Perform
fit
onX
and returns the predictedPortfolio
orPopulation
ofPortfolio
onX
based on the fittedweights
.Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
predict
(X)Predict the
Portfolio
orPopulation
ofPortfolio
onX
based on the fitted weights.score
(X[, y])Prediction score.
set_params
(**params)Set the parameters of this estimator.
- fit(X, y=None, **fit_params)[source]#
Fit the Nested Clusters Optimization estimator.
- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Price returns of the assets.
- yarray-like of shape (n_observations, n_targets), optional
Price returns of factors or a target benchmark. The default is
None
.- **fit_paramsdict
Parameters to pass to the underlying estimators. Only available if
enable_metadata_routing=True
, which can be set by usingsklearn.set_config(enable_metadata_routing=True)
. See Metadata Routing User Guide for more details.
- Returns:
- selfNestedClustersOptimization
Fitted estimator.
- fit_predict(X)#
Perform
fit
onX
and returns the predictedPortfolio
orPopulation
ofPortfolio
onX
based on the fittedweights
. For factor models, usefit(X, y)
thenpredict(X)
separately.- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Price returns of the assets.
- Returns:
- predictionPortfolio | Population
Portfolio
orPopulation
ofPortfolio
estimated onX
based on the fittedweights
.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)#
Predict the
Portfolio
orPopulation
ofPortfolio
onX
based on the fitted weights.Optimization estimators can return a 1D or a 2D array of
weights
. For a 1D array, the prediction returns aPortfolio
. For a 2D array, the prediction returns aPopulation
ofPortfolio
.If
name
is not provided in the portfolio arguments, we use the first 500 characters of the estimator name.- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Price returns of the assets.
- Returns:
- predictionPortfolio | Population
Portfolio
orPopulation
ofPortfolio
estimated onX
based on the fittedweights
.
- score(X, y=None)#
Prediction score. If the prediction is a single
Portfolio
, the score is the Sharpe Ratio. If the prediction is aPopulation
ofPortfolio
, the score is the mean of all the portfolios Sharpe Ratios in the population.- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Price returns of the assets.
- yIgnored
Not used, present here for API consistency by convention.
- Returns:
- scorefloat
The Sharpe Ratio of the portfolio if the prediction is a single
Portfolio
or the mean of all the portfolios Sharpe Ratios if the prediction is aPopulation
ofPortfolio
.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.