Note
Go to the end to download the full example code. or to run this example in your browser via Binder
HRP vs HERC#
In this tutorial, we will compare the
HierarchicalRiskParity
(HRP) optimization with the
HierarchicalEqualRiskContribution
(HERC) optimization.
For that comparison, we consider a 3 months rolling (60 business days) allocation fitted on the preceding year of data (252 business days) that minimizes the CVaR.
We will employ GridSearchCV
to select the optimal parameters of each model on the
training set using cross-validation that achieves the highest average out-of-sample
Mean-CVaR ratio.
Then, we will evaluate the models on the test set and compare them with the equal-weighted benchmark.
Finally, we will use the CombinatorialPurgedCV
to
analyze the stability and distribution of both models.
Data#
We load the FTSE 100 dataset composed of the daily prices of 64 assets from the FTSE 100 Index composition starting from 2000-01-04 up to 2023-05-31:
from plotly.io import show
from sklearn.model_selection import GridSearchCV, train_test_split
from skfolio import Population, RatioMeasure, RiskMeasure
from skfolio.cluster import HierarchicalClustering, LinkageMethod
from skfolio.datasets import load_ftse100_dataset
from skfolio.distance import KendallDistance, PearsonDistance
from skfolio.metrics import make_scorer
from skfolio.model_selection import (
CombinatorialPurgedCV,
WalkForward,
cross_val_predict,
optimal_folds_number,
)
from skfolio.optimization import (
HierarchicalEqualRiskContribution,
HierarchicalRiskParity,
)
from skfolio.preprocessing import prices_to_returns
prices = load_ftse100_dataset()
X = prices_to_returns(prices)
X_train, X_test = train_test_split(X, test_size=0.33, shuffle=False)
Model#
We create two models: an HRP-CVaR and an HERC-CVaR:
model_hrp = HierarchicalRiskParity(
risk_measure=RiskMeasure.CVAR,
hierarchical_clustering_estimator=HierarchicalClustering(),
)
model_herc = HierarchicalEqualRiskContribution(
risk_measure=RiskMeasure.CVAR,
hierarchical_clustering_estimator=HierarchicalClustering(),
)
Parameter Tuning#
For both HRP and HERC models, we find the parameters that maximizes the average
out-of-sample Mean-CVaR ratio using GridSearchCV
with WalkForward
cross-validation
on the training set. The WalkForward
are chosen to simulate a three months
(60 business days) rolling portfolio fitted on the previous year (252 business days):
cv = WalkForward(train_size=252, test_size=60)
grid_search_hrp = GridSearchCV(
estimator=model_hrp,
cv=cv,
n_jobs=-1,
param_grid={
"distance_estimator": [PearsonDistance(), KendallDistance()],
"hierarchical_clustering_estimator__linkage_method": [
# LinkageMethod.SINGLE,
LinkageMethod.WARD,
LinkageMethod.COMPLETE,
],
},
scoring=make_scorer(RatioMeasure.CVAR_RATIO),
)
grid_search_hrp.fit(X_train)
model_hrp = grid_search_hrp.best_estimator_
print(model_hrp)
HierarchicalRiskParity(distance_estimator=KendallDistance(),
hierarchical_clustering_estimator=HierarchicalClustering(),
risk_measure=CVaR)
grid_search_herc = grid_search_hrp.set_params(estimator=model_herc)
grid_search_herc.fit(X_train)
model_herc = grid_search_herc.best_estimator_
print(model_herc)
HierarchicalEqualRiskContribution(distance_estimator=PearsonDistance(),
hierarchical_clustering_estimator=HierarchicalClustering(linkage_method=COMPLETE),
risk_measure=CVaR)
Prediction#
We evaluate the two models using the same WalkForward
object on the test set:
pred_hrp = cross_val_predict(
model_hrp,
X_test,
cv=cv,
n_jobs=-1,
portfolio_params=dict(name="HRP"),
)
pred_herc = cross_val_predict(
model_herc,
X_test,
cv=cv,
n_jobs=-1,
portfolio_params=dict(name="HERC"),
)
Each predicted object is a MultiPeriodPortfolio
.
For improved analysis, we can add them to a Population
:
population = Population([pred_hrp, pred_herc])
Let’s plot the rolling portfolios compositions:
population.plot_composition(display_sub_ptf_name=False)
Let’s plot the rolling portfolios cumulative returns on the test set:
population.plot_cumulative_returns()
Analysis#
HERC outperform HRP both in terms of CVaR minimization and Mean-CVaR ratio maximization:
for ptf in population:
print("=" * 25)
print(" " * 8 + ptf.name)
print("=" * 25)
print(f"CVaR : {ptf.cvar:0.2%}")
print(f"Mean-CVaR ratio : {ptf.cvar_ratio:0.4f}")
print("\n")
=========================
HRP
=========================
CVaR : 2.44%
Mean-CVaR ratio : 0.0141
=========================
HERC
=========================
CVaR : 2.45%
Mean-CVaR ratio : 0.0159
Combinatorial Purged Cross-Validation#
Only using one testing path (the historical path) may not be enough to compare
models. For a more robust analysis, we can use the
CombinatorialPurgedCV
to create multiple testing
paths from different training folds combinations.
We choose n_folds
and n_test_folds
to obtain around 100 test paths and an average
training size of 252 days:
n_folds, n_test_folds = optimal_folds_number(
n_observations=X_test.shape[0],
target_n_test_paths=100,
target_train_size=252,
)
cv = CombinatorialPurgedCV(n_folds=n_folds, n_test_folds=n_test_folds)
cv.summary(X_test)
Number of Observations 1967
Total Number of Folds 16
Number of Test Folds 14
Purge Size 0
Embargo Size 0
Average Training Size 245
Number of Test Paths 105
Number of Training Combinations 120
dtype: int64
pred_hrp = cross_val_predict(
model_hrp,
X_test,
cv=cv,
n_jobs=-1,
portfolio_params=dict(tag="HRP"),
)
pred_herc = cross_val_predict(
model_herc,
X_test,
cv=cv,
n_jobs=-1,
portfolio_params=dict(tag="HERC"),
)
The predicted object is a Population
of MultiPeriodPortfolio
. Each
MultiPeriodPortfolio
represents one testing path of a rolling portfolio.
For improved analysis, we can merge the populations of each model:
population = pred_hrp + pred_herc
Distribution#
We plot the out-of-sample distribution of Mean-CVaR Ratio for each model:
population.plot_distribution(
measure_list=[RatioMeasure.CVAR_RATIO], tag_list=["HRP", "HERC"], n_bins=50
)
for pred in [pred_hrp, pred_herc]:
print("=" * 25)
print(" " * 8 + pred[0].tag)
print("=" * 25)
print(
"Average Mean-CVaR ratio :"
f" {pred.measures_mean(measure=RatioMeasure.CVAR_RATIO):0.4f}"
)
print(
"Std Mean-CVaR ratio :"
f" {pred.measures_std(measure=RatioMeasure.CVAR_RATIO):0.4f}"
)
print("\n")
=========================
HRP
=========================
Average Mean-CVaR ratio : 0.0149
Std Mean-CVaR ratio : 0.0005
=========================
HERC
=========================
Average Mean-CVaR ratio : 0.0157
Std Mean-CVaR ratio : 0.0029
We can see that, in terms of Mean-CVaR Ratio distribution, the HERC model has a higher mean than the HRP model but also a higher standard deviation. In other words, HERC is less stable than HRP but performs slightly better in average.
We can do the same analysis for other measures:
fig = population.plot_distribution(
measure_list=[
RatioMeasure.ANNUALIZED_SHARPE_RATIO,
RatioMeasure.ANNUALIZED_SORTINO_RATIO,
],
tag_list=["HRP", "HERC"],
n_bins=50,
)
show(fig)
Total running time of the script: (1 minutes 35.361 seconds)