HRP vs HERC#

In this tutorial, we will compare the HierarchicalRiskParity (HRP) optimization with the HierarchicalEqualRiskContribution (HERC) optimization.

For that comparison, we consider a 3 months rolling (60 business days) allocation fitted on the preceding year of data (252 business days) that minimizes the CVaR.

We will employ GridSearchCV to select the optimal parameters of each model on the training set using cross-validation that achieves the highest average out-of-sample Mean-CVaR ratio.

Then, we will evaluate the models on the test set and compare them with the equal-weighted benchmark.

Finally, we will use the CombinatorialPurgedCV to analyze the stability and distribution of both models.

Data#

We load the FTSE 100 dataset composed of the daily prices of 64 assets from the FTSE 100 Index composition starting from 2000-01-04 up to 2023-05-31:

from plotly.io import show
from sklearn.model_selection import GridSearchCV, train_test_split

from skfolio import Population, RatioMeasure, RiskMeasure
from skfolio.cluster import HierarchicalClustering, LinkageMethod
from skfolio.datasets import load_ftse100_dataset
from skfolio.distance import KendallDistance, PearsonDistance
from skfolio.metrics import make_scorer
from skfolio.model_selection import (
    CombinatorialPurgedCV,
    WalkForward,
    cross_val_predict,
    optimal_folds_number,
)
from skfolio.optimization import (
    HierarchicalEqualRiskContribution,
    HierarchicalRiskParity,
)
from skfolio.preprocessing import prices_to_returns

prices = load_ftse100_dataset()

X = prices_to_returns(prices)
X_train, X_test = train_test_split(X, test_size=0.33, shuffle=False)

Model#

We create two models: an HRP-CVaR and an HERC-CVaR:

model_hrp = HierarchicalRiskParity(
    risk_measure=RiskMeasure.CVAR,
    hierarchical_clustering_estimator=HierarchicalClustering(),
)

model_herc = HierarchicalEqualRiskContribution(
    risk_measure=RiskMeasure.CVAR,
    hierarchical_clustering_estimator=HierarchicalClustering(),
)

Parameter Tuning#

For both HRP and HERC models, we find the parameters that maximizes the average out-of-sample Mean-CVaR ratio using GridSearchCV with WalkForward cross-validation on the training set. The WalkForward are chosen to simulate a three months (60 business days) rolling portfolio fitted on the previous year (252 business days):

cv = WalkForward(train_size=252, test_size=60)

grid_search_hrp = GridSearchCV(
    estimator=model_hrp,
    cv=cv,
    n_jobs=-1,
    param_grid={
        "distance_estimator": [PearsonDistance(), KendallDistance()],
        "hierarchical_clustering_estimator__linkage_method": [
            # LinkageMethod.SINGLE,
            LinkageMethod.WARD,
            LinkageMethod.COMPLETE,
        ],
    },
    scoring=make_scorer(RatioMeasure.CVAR_RATIO),
)
grid_search_hrp.fit(X_train)
model_hrp = grid_search_hrp.best_estimator_
print(model_hrp)
HierarchicalRiskParity(distance_estimator=KendallDistance(),
                       hierarchical_clustering_estimator=HierarchicalClustering(),
                       risk_measure=CVaR)
grid_search_herc = grid_search_hrp.set_params(estimator=model_herc)
grid_search_herc.fit(X_train)
model_herc = grid_search_herc.best_estimator_
print(model_herc)
HierarchicalEqualRiskContribution(distance_estimator=PearsonDistance(),
                                  hierarchical_clustering_estimator=HierarchicalClustering(linkage_method=COMPLETE),
                                  risk_measure=CVaR)

Prediction#

We evaluate the two models using the same WalkForward object on the test set:

pred_hrp = cross_val_predict(
    model_hrp,
    X_test,
    cv=cv,
    n_jobs=-1,
    portfolio_params=dict(name="HRP"),
)

pred_herc = cross_val_predict(
    model_herc,
    X_test,
    cv=cv,
    n_jobs=-1,
    portfolio_params=dict(name="HERC"),
)

Each predicted object is a MultiPeriodPortfolio. For improved analysis, we can add them to a Population:

population = Population([pred_hrp, pred_herc])

Let’s plot the rolling portfolios compositions:

population.plot_composition(display_sub_ptf_name=False)


Let’s plot the rolling portfolios cumulative returns on the test set:

population.plot_cumulative_returns()


Analysis#

HERC outperform HRP both in terms of CVaR minimization and Mean-CVaR ratio maximization:

for ptf in population:
    print("=" * 25)
    print(" " * 8 + ptf.name)
    print("=" * 25)
    print(f"CVaR : {ptf.cvar:0.2%}")
    print(f"Mean-CVaR ratio : {ptf.cvar_ratio:0.4f}")
    print("\n")
=========================
        HRP
=========================
CVaR : 2.44%
Mean-CVaR ratio : 0.0141


=========================
        HERC
=========================
CVaR : 2.45%
Mean-CVaR ratio : 0.0159

Combinatorial Purged Cross-Validation#

Only using one testing path (the historical path) may not be enough to compare models. For a more robust analysis, we can use the CombinatorialPurgedCV to create multiple testing paths from different training folds combinations.

We choose n_folds and n_test_folds to obtain around 100 test paths and an average training size of 252 days:

n_folds, n_test_folds = optimal_folds_number(
    n_observations=X_test.shape[0],
    target_n_test_paths=100,
    target_train_size=252,
)

cv = CombinatorialPurgedCV(n_folds=n_folds, n_test_folds=n_test_folds)
cv.summary(X_test)
Number of Observations             1967
Total Number of Folds                16
Number of Test Folds                 14
Purge Size                            0
Embargo Size                          0
Average Training Size               245
Number of Test Paths                105
Number of Training Combinations     120
dtype: int64
pred_hrp = cross_val_predict(
    model_hrp,
    X_test,
    cv=cv,
    n_jobs=-1,
    portfolio_params=dict(tag="HRP"),
)
pred_herc = cross_val_predict(
    model_herc,
    X_test,
    cv=cv,
    n_jobs=-1,
    portfolio_params=dict(tag="HERC"),
)

The predicted object is a Population of MultiPeriodPortfolio. Each MultiPeriodPortfolio represents one testing path of a rolling portfolio. For improved analysis, we can merge the populations of each model:

population = pred_hrp + pred_herc

Distribution#

We plot the out-of-sample distribution of Mean-CVaR Ratio for each model:

population.plot_distribution(
    measure_list=[RatioMeasure.CVAR_RATIO], tag_list=["HRP", "HERC"], n_bins=50
)


for pred in [pred_hrp, pred_herc]:
    print("=" * 25)
    print(" " * 8 + pred[0].tag)
    print("=" * 25)
    print(
        "Average Mean-CVaR ratio :"
        f" {pred.measures_mean(measure=RatioMeasure.CVAR_RATIO):0.4f}"
    )
    print(
        "Std Mean-CVaR ratio :"
        f" {pred.measures_std(measure=RatioMeasure.CVAR_RATIO):0.4f}"
    )
    print("\n")
=========================
        HRP
=========================
Average Mean-CVaR ratio : 0.0149
Std Mean-CVaR ratio : 0.0005


=========================
        HERC
=========================
Average Mean-CVaR ratio : 0.0157
Std Mean-CVaR ratio : 0.0029

We can see that, in terms of Mean-CVaR Ratio distribution, the HERC model has a higher mean than the HRP model but also a higher standard deviation. In other words, HERC is less stable than HRP but performs slightly better in average.

We can do the same analysis for other measures:

fig = population.plot_distribution(
    measure_list=[
        RatioMeasure.ANNUALIZED_SHARPE_RATIO,
        RatioMeasure.ANNUALIZED_SORTINO_RATIO,
    ],
    tag_list=["HRP", "HERC"],
    n_bins=50,
)
show(fig)

Total running time of the script: (1 minutes 30.601 seconds)

Gallery generated by Sphinx-Gallery