skfolio.cluster
.HierarchicalClustering#
- class skfolio.cluster.HierarchicalClustering(max_clusters=None, linkage_method=WARD)[source]#
Hierarchical Clustering.
- Parameters:
- max_clustersint, optional
For coherent clustering, the algorithm finds a minimum threshold
r
so that the cophenetic distance between any two original observations in the same flat cluster is no more thanr
and no more thanmax_clusters
flat clusters are formed. The default (None
) is to estimate the maximal number of clusters based on the Two-Order Difference to Gap Statistic [1].- linkage_methodLinkageMethod, default=LinkageMethod.WARD
Methods for calculating the distance between clusters in the linkage matrix. See the
Linkage Methods
section ofscipy.cluster.hierarchy.linkage
for the full descriptions. The default is the Ward variance minimization algorithmLinkageMethod.WARD
.
- Attributes:
- n_clusters_int
Number of formed clusters.
- labels_ndarray of shape (n_assets,)
Labels of each asset.
- linkage_matrix_ndarray of shape (n_assets - 1, 4)
Linkage matrix computed from the distance matrix of the
distance_estimator
.- condensed_distance_ndarray of shape (\binom{n_assets}{2}, )
The 1-D condensed distance matrix.
- n_features_in_int
Number of assets seen during
fit
.- feature_names_in_ndarray of shape (
n_features_in_
,) Names of assets seen during
fit
. Defined only whenX
has assets names that are all strings.
References
[1]“Application of two-order difference to gap statistic”. Yue, Wang & Wei (2009)
Methods
fit
(X[, y])Fit the Hierarchical Equal Risk Contribution estimator.
fit_predict
(X[, y])Perform clustering on
X
and returns cluster labels.Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
plot_dendrogram
([heatmap])Plot the dendrogram.
set_params
(**params)Set the parameters of this estimator.
- fit(X, y=None)[source]#
Fit the Hierarchical Equal Risk Contribution estimator.
- Parameters:
- Xarray-like of shape (n_assets, n_assets)
Distance matrix of the assets.
- yIgnored
Not used, present for API consistency by convention.
- Returns:
- selfHierarchicalClustering
Fitted estimator.
- fit_predict(X, y=None, **kwargs)#
Perform clustering on
X
and returns cluster labels.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input data.
- yIgnored
Not used, present for API consistency by convention.
- **kwargsdict
Arguments to be passed to
fit
.Added in version 1.4.
- Returns:
- labelsndarray of shape (n_samples,), dtype=np.int64
Cluster labels.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- plot_dendrogram(heatmap=True)[source]#
Plot the dendrogram.
The blue lines represent distinct clusters composed of a single asset. The remaining colors represent clusters of more than one asset.
When
heatmap
is set to True, the heatmap of the reordered distance matrix is displayed below the dendrogram and clusters are outlined with yellow squares.The number of clusters used in the plot is the same as the
n_clusters_
attribute if it exists, otherwise a default number is used corresponding to the number of cluster with a distance above 70% of the maximum cluster distance.- Parameters:
- heatmapbool, default=True
If this is set to True, the distance heatmap is returned with the clustered outlined in yellow.
- Returns:
- figFigure
The dendrogram figure.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.