skfolio.cluster.HierarchicalClustering#
- class skfolio.cluster.HierarchicalClustering(max_clusters=None, linkage_method=WARD)[source]#
Hierarchical Clustering.
- Parameters:
- max_clustersint, optional
For coherent clustering, the algorithm finds a minimum threshold
rso that the cophenetic distance between any two original observations in the same flat cluster is no more thanrand no more thanmax_clustersflat clusters are formed. The default (None) is to estimate the maximal number of clusters based on the Two-Order Difference to Gap Statistic [1].- linkage_methodLinkageMethod, default=LinkageMethod.WARD
Methods for calculating the distance between clusters in the linkage matrix. See the
Linkage Methodssection ofscipy.cluster.hierarchy.linkagefor the full descriptions. The default is the Ward variance minimization algorithmLinkageMethod.WARD.
- Attributes:
- n_clusters_int
Number of formed clusters.
- labels_ndarray of shape (n_assets,)
Labels of each asset.
- linkage_matrix_ndarray of shape (n_assets - 1, 4)
Linkage matrix computed from the distance matrix of the
distance_estimator.- condensed_distance_ndarray of shape (\binom{n_assets}{2}, )
The 1-D condensed distance matrix.
- n_features_in_int
Number of assets seen during
fit.- feature_names_in_ndarray of shape (
n_features_in_,) Names of assets seen during
fit. Defined only whenXhas assets names that are all strings.
Methods
fit(X[, y])Fit the Hierarchical Equal Risk Contribution estimator.
fit_predict(X[, y])Perform clustering on
Xand returns cluster labels.Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
plot_dendrogram([heatmap])Plot the dendrogram.
set_params(**params)Set the parameters of this estimator.
References
[1]“Application of two-order difference to gap statistic”. Yue, Wang & Wei (2009)
- fit(X, y=None)[source]#
Fit the Hierarchical Equal Risk Contribution estimator.
- Parameters:
- Xarray-like of shape (n_assets, n_assets)
Distance matrix of the assets.
- yIgnored
Not used, present for API consistency by convention.
- Returns:
- selfHierarchicalClustering
Fitted estimator.
- fit_predict(X, y=None, **kwargs)#
Perform clustering on
Xand returns cluster labels.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input data.
- yIgnored
Not used, present for API consistency by convention.
- **kwargsdict
Arguments to be passed to
fit.Added in version 1.4.
- Returns:
- labelsndarray of shape (n_samples,), dtype=np.int64
Cluster labels.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- plot_dendrogram(heatmap=True)[source]#
Plot the dendrogram.
The blue lines represent distinct clusters composed of a single asset. The remaining colors represent clusters of more than one asset.
When
heatmapis set to True, the heatmap of the reordered distance matrix is displayed below the dendrogram and clusters are outlined with yellow squares.The number of clusters used in the plot is the same as the
n_clusters_attribute if it exists, otherwise a default number is used corresponding to the number of cluster with a distance above 70% of the maximum cluster distance.- Parameters:
- heatmapbool, default=True
If this is set to True, the distance heatmap is returned with the clustered outlined in yellow.
- Returns:
- figFigure
The dendrogram figure.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.