Note

Go to the end to download the full example code. or to run this example in your browser via JupyterLite or Binder

Hierarchical Risk Parity - CVaR#

This tutorial introduces the HierarchicalRiskParity optimization.

Hierarchical Risk Parity (HRP) is a portfolio optimization method developed by Marcos Lopez de Prado.

This algorithm uses a distance matrix to compute hierarchical clusters using the Hierarchical Tree Clustering algorithm. It then employs seriation to rearrange the assets in the dendrogram, minimizing the distance between leafs.

The final step is the recursive bisection where each cluster is split between two sub-clusters by starting with the topmost cluster and traversing in a top-down manner. For each sub-cluster, we compute the total cluster risk of an inverse-risk allocation. A weighting factor is then computed from these two sub-cluster risks, which is used to update the cluster weight.

Note

The original paper uses the variance as the risk measure and the single-linkage method for the Hierarchical Tree Clustering algorithm. Here we generalize it to multiple risk measures and linkage methods. The default linkage method is set to the Ward variance minimization algorithm, which is more stable and has better properties than the single-linkage method.

In this example, we will use the CVaR risk measure.

Data#

We load the S&P 500 dataset composed of the daily prices of 20 assets from the SPX Index composition and the Factors dataset composed of the daily prices of 5 ETF representing common factors:

from plotly.io import show
from sklearn.model_selection import train_test_split

from skfolio import Population, RiskMeasure
from skfolio.cluster import HierarchicalClustering, LinkageMethod
from skfolio.datasets import load_factors_dataset, load_sp500_dataset
from skfolio.distance import KendallDistance
from skfolio.optimization import EqualWeighted, HierarchicalRiskParity
from skfolio.preprocessing import prices_to_returns
from skfolio.prior import FactorModel

prices = load_sp500_dataset()
factor_prices = load_factors_dataset()

prices = prices["2014":]
factor_prices = factor_prices["2014":]

X, y = prices_to_returns(prices, factor_prices)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, shuffle=False)

Model#

We create the CVaR Hierarchical Risk Parity model and then fit it on the training set:

model1 = HierarchicalRiskParity(
    risk_measure=RiskMeasure.CVAR, portfolio_params=dict(name="HRP-CVaR-Ward-Pearson")
)
model1.fit(X_train)
model1.weights_

array([0.05033705, 0.02773558, 0.05289115, 0.03632272, 0.059202  ,
       0.02483767, 0.03790179, 0.07464383, 0.03497807, 0.08622477,
       0.06308422, 0.04094166, 0.03144452, 0.08277551, 0.04421773,
       0.04807705, 0.02596219, 0.07596741, 0.0393462 , 0.06310889])

Risk Contribution#

Let’s analyze the risk contribution of the model on the training set:

ptf1 = model1.predict(X_train)
ptf1.plot_contribution(measure=RiskMeasure.CVAR)

Dendrogram#

To analyze the clusters structure, we plot the dendrogram. The blue lines represent distinct clusters composed of a single asset. The remaining colors represent clusters of more than one asset:

model1.hierarchical_clustering_estimator_.plot_dendrogram(heatmap=False)

The horizontal axis represents the assets. The links between clusters are represented as upside-down U-shaped lines. The height of the U indicates the distance between the clusters. For example, the link representing the cluster containing assets HD and WMT has a distance of 0.5 (called cophenetic distance).

When heatmap is set to True, the heatmap of the reordered distance matrix is displayed below the dendrogram and clusters are outlined with yellow squares:

fig = model1.hierarchical_clustering_estimator_.plot_dendrogram()
show(fig)

Linkage Methods#

The clustering can be greatly affected by the choice of the linkage method. The original HRP is based on the single-linkage (equivalent to the minimum spanning tree), which suffers from the chaining effect. In the HierarchicalRiskParity estimator, the default linkage method is set to the Ward variance minimization algorithm, which is more stable and has better properties than the single-linkage method.

However, since the HRP optimization doesn’t utilize the full cluster structure but only their orders, the allocation remains relatively stable regardless of the chosen linkage method.

# To show this effect, let's create a second model with the single-linkage method:
model2 = HierarchicalRiskParity(
    risk_measure=RiskMeasure.CVAR,
    hierarchical_clustering_estimator=HierarchicalClustering(
        linkage_method=LinkageMethod.SINGLE,
    ),
    portfolio_params=dict(name="HRP-CVaR-Single-Pearson"),
)
model2.fit(X_train)

model2.hierarchical_clustering_estimator_.plot_dendrogram(heatmap=True)

We can see that the clustering has been greatly affected by the change of the linkage method. However, you will see bellow that the weights remain relatively stable for the reason explained earlier.

Distance Estimator#

The choice of distance metric has also an important effect on the clustering. The default is to use the distance from the pearson correlation matrix. This can be changed using the distance estimators.

For example, let’s create a third model with a distance computed from the absolute value of the Kendal correlation matrix:

model3 = HierarchicalRiskParity(
    risk_measure=RiskMeasure.CVAR,
    distance_estimator=KendallDistance(absolute=True),
    portfolio_params=dict(name="HRP-CVaR-Ward-Kendal"),
)
model3.fit(X_train)

model3.hierarchical_clustering_estimator_.plot_dendrogram(heatmap=True)

Prior Estimator#

Finally, HRP like the other portfolio optimization, uses a prior estimator that fits a PriorModel containing the distribution estimate of asset returns. It represents the investor’s prior beliefs about the model used to estimate such distribution. The default is the EmpiricalPrior estimator.

Let’s create new model with the FactorModel estimator:

model4 = HierarchicalRiskParity(
    risk_measure=RiskMeasure.CVAR,
    prior_estimator=FactorModel(),
    portfolio_params=dict(name="HRP-CVaR-Factor-Model"),
)
model4.fit(X_train, y_train)

model4.hierarchical_clustering_estimator_.plot_dendrogram(heatmap=True)

To compare the models, we use an equal weighted benchmark using the EqualWeighted estimator:

bench = EqualWeighted()
bench.fit(X_train)
bench.weights_

array([0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05,
       0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05])

Prediction#

We predict the models and the benchmark on the test set:

population_test = Population([])
for model in [model1, model2, model3, model4, bench]:
    population_test.append(model.predict(X_test))

population_test.plot_cumulative_returns()

Composition#

From the below composition, we notice that all models are relatively close to each others as explain earlier:

population_test.plot_composition()

Summary#

Finally, let’s print the summary statistics:

summary = population_test.summary()
summary.loc["Annualized Sharpe Ratio"]

HRP-CVaR-Ward-Pearson      0.86
HRP-CVaR-Single-Pearson    0.84
HRP-CVaR-Ward-Kendal       0.86
HRP-CVaR-Factor-Model      0.87
EqualWeighted              0.86
Name: Annualized Sharpe Ratio, dtype: object

summary

	HRP-CVaR-Ward-Pearson	HRP-CVaR-Single-Pearson	HRP-CVaR-Ward-Kendal	HRP-CVaR-Factor-Model	EqualWeighted
Mean	0.081%	0.079%	0.079%	0.081%	0.084%
Annualized Mean	20.41%	19.81%	19.95%	20.29%	21.27%
Variance	0.022%	0.022%	0.022%	0.022%	0.024%
Annualized Variance	5.62%	5.51%	5.42%	5.47%	6.13%
Semi-Variance	0.011%	0.011%	0.011%	0.011%	0.012%
Annualized Semi-Variance	2.89%	2.82%	2.78%	2.78%	3.11%
Standard Deviation	1.49%	1.48%	1.47%	1.47%	1.56%
Annualized Standard Deviation	23.71%	23.48%	23.29%	23.38%	24.75%
Semi-Deviation	1.07%	1.06%	1.05%	1.05%	1.11%
Annualized Semi-Deviation	16.99%	16.78%	16.69%	16.66%	17.63%
Mean Absolute Deviation	0.93%	0.92%	0.91%	0.92%	0.99%
CVaR at 95%	3.51%	3.44%	3.43%	3.42%	3.65%
EVaR at 95%	6.51%	6.51%	6.47%	6.39%	6.58%
Worst Realization	10.64%	10.70%	10.66%	10.50%	10.77%
CDaR at 95%	17.37%	17.36%	16.94%	17.78%	18.54%
MAX Drawdown	35.36%	34.19%	33.56%	34.03%	34.70%
Average Drawdown	3.00%	3.14%	3.05%	3.28%	3.38%
EDaR at 95%	24.28%	23.72%	23.32%	23.86%	24.47%
First Lower Partial Moment	0.46%	0.46%	0.46%	0.46%	0.50%
Ulcer Index	0.053	0.054	0.052	0.056	0.058
Gini Mean Difference	1.42%	1.40%	1.40%	1.41%	1.51%
Value at Risk at 95%	2.02%	1.93%	1.97%	1.97%	2.15%
Drawdown at Risk at 95%	10.14%	10.90%	10.35%	11.49%	12.03%
Entropic Risk Measure at 95%	3.00	3.00	3.00	3.00	3.00
Fourth Central Moment	0.000079%	0.000078%	0.000073%	0.000074%	0.000090%
Fourth Lower Partial Moment	0.000041%	0.000040%	0.000039%	0.000037%	0.000043%
Skew	-8.51%	-6.32%	-11.56%	-1.42%	4.61%
Kurtosis	1588.74%	1630.38%	1577.65%	1565.39%	1526.86%
Sharpe Ratio	0.054	0.053	0.054	0.055	0.054
Annualized Sharpe Ratio	0.86	0.84	0.86	0.87	0.86
Sortino Ratio	0.076	0.074	0.075	0.077	0.076
Annualized Sortino Ratio	1.20	1.18	1.20	1.22	1.21
Mean Absolute Deviation Ratio	0.087	0.086	0.087	0.087	0.085
First Lower Partial Moment Ratio	0.17	0.17	0.17	0.17	0.17
Value at Risk Ratio at 95%	0.040	0.041	0.040	0.041	0.039
CVaR Ratio at 95%	0.023	0.023	0.023	0.024	0.023
Entropic Risk Measure Ratio at 95%	0.00027	0.00026	0.00026	0.00027	0.00028
EVaR Ratio at 95%	0.012	0.012	0.012	0.013	0.013
Worst Realization Ratio	0.0076	0.0073	0.0074	0.0077	0.0078
Drawdown at Risk Ratio at 95%	0.0080	0.0072	0.0076	0.0070	0.0070
CDaR Ratio at 95%	0.0047	0.0045	0.0047	0.0045	0.0046
Calmar Ratio	0.0023	0.0023	0.0024	0.0024	0.0024
Average Drawdown Ratio	0.027	0.025	0.026	0.025	0.025
EDaR Ratio at 95%	0.0033	0.0033	0.0034	0.0034	0.0034
Ulcer Index Ratio	0.015	0.015	0.015	0.014	0.015
Gini Mean Difference Ratio	0.057	0.056	0.057	0.057	0.056
Effective Number of Assets	17.560177235001134	17.301057537801782	17.53328937509076	17.272263586429105	19.999999999999993
Assets Number	20	20	20	20	20

Total running time of the script: (0 minutes 1.845 seconds)

Gallery generated by Sphinx-Gallery