skfolio.cluster.HierarchicalClustering#

class skfolio.cluster.HierarchicalClustering(max_clusters=None, linkage_method=WARD)[source]#

Hierarchical Clustering.

Parameters:
max_clustersint, optional

For coherent clustering, the algorithm finds a minimum threshold r so that the cophenetic distance between any two original observations in the same flat cluster is no more than r and no more than max_clusters flat clusters are formed. The default (None) is to estimate the maximal number of clusters based on the Two-Order Difference to Gap Statistic [1].

linkage_methodLinkageMethod, default=LinkageMethod.WARD

Methods for calculating the distance between clusters in the linkage matrix. See the Linkage Methods section of scipy.cluster.hierarchy.linkage for the full descriptions. The default is the Ward variance minimization algorithm LinkageMethod.WARD.

Attributes:
n_clusters_int

Number of formed clusters.

labels_ndarray of shape (n_assets,)

Labels of each asset.

linkage_matrix_ndarray of shape (n_assets - 1, 4)

Linkage matrix computed from the distance matrix of the distance_estimator.

condensed_distance_ndarray of shape (\binom{n_assets}{2}, )

The 1-D condensed distance matrix.

n_features_in_int

Number of assets seen during fit.

feature_names_in_ndarray of shape (n_features_in_,)

Names of assets seen during fit. Defined only when X has assets names that are all strings.

References

[1]

“Application of two-order difference to gap statistic”. Yue, Wang & Wei (2009)

Methods

fit(X[, y])

Fit the Hierarchical Equal Risk Contribution estimator.

fit_predict(X[, y])

Perform clustering on X and returns cluster labels.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

plot_dendrogram([heatmap])

Plot the dendrogram.

set_params(**params)

Set the parameters of this estimator.

fit(X, y=None)[source]#

Fit the Hierarchical Equal Risk Contribution estimator.

Parameters:
Xarray-like of shape (n_assets, n_assets)

Distance matrix of the assets.

yIgnored

Not used, present for API consistency by convention.

Returns:
selfHierarchicalClustering

Fitted estimator.

fit_predict(X, y=None, **kwargs)#

Perform clustering on X and returns cluster labels.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input data.

yIgnored

Not used, present for API consistency by convention.

**kwargsdict

Arguments to be passed to fit.

Added in version 1.4.

Returns:
labelsndarray of shape (n_samples,), dtype=np.int64

Cluster labels.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

plot_dendrogram(heatmap=True)[source]#

Plot the dendrogram.

The blue lines represent distinct clusters composed of a single asset. The remaining colors represent clusters of more than one asset.

When heatmap is set to True, the heatmap of the reordered distance matrix is displayed below the dendrogram and clusters are outlined with yellow squares.

The number of clusters used in the plot is the same as the n_clusters_ attribute if it exists, otherwise a default number is used corresponding to the number of cluster with a distance above 70% of the maximum cluster distance.

Parameters:
heatmapbool, default=True

If this is set to True, the distance heatmap is returned with the clustered outlined in yellow.

Returns:
figFigure

The dendrogram figure.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.