Stacking Optimization#

This tutorial introduces the StackingOptimization.

Stacking Optimization is an ensemble method that consists in stacking the output of individual portfolio optimizations with a final portfolio optimization.

The weights are the dot-product of individual optimizations weights with the final optimization weights.

Stacking allows to use the strength of each individual portfolio optimization by using their output as input of a final portfolio optimization.

To avoid data leakage, out-of-sample estimates are used to fit the outer optimization.

Note

The estimators_ are fitted on the full X while final_estimator_ is trained using cross-validated predictions of the base estimators using cross_val_predict.

Data#

We load the FTSE 100 dataset. This dataset is composed of the daily prices of 64 assets from the FTSE 100 Index composition starting from 2000-01-04 up to 2023-05-31:

from plotly.io import show
from sklearn.model_selection import GridSearchCV, train_test_split

from skfolio import Population, RatioMeasure, RiskMeasure
from skfolio.datasets import load_ftse100_dataset
from skfolio.metrics import make_scorer
from skfolio.model_selection import (
    CombinatorialPurgedCV,
    WalkForward,
    cross_val_predict,
    optimal_folds_number,
)
from skfolio.moments import EmpiricalCovariance, LedoitWolf
from skfolio.optimization import (
    EqualWeighted,
    HierarchicalEqualRiskContribution,
    InverseVolatility,
    MaximumDiversification,
    MeanRisk,
    ObjectiveFunction,
    StackingOptimization,
)
from skfolio.preprocessing import prices_to_returns
from skfolio.prior import EmpiricalPrior

prices = load_ftse100_dataset()

X = prices_to_returns(prices)
X_train, X_test = train_test_split(X, test_size=0.50, shuffle=False)

Stacking Model#

Our stacking model will be composed of 4 models:
  • Inverse Volatility

  • Maximum Diversification

  • Maximum Mean-Risk Utility allowing short position with L1 regularization

  • Hierarchical Equal Risk Contribution

We will stack these 4 models together using the Mean-CDaR utility maximization:

estimators = [
    ("model1", InverseVolatility()),
    ("model2", MaximumDiversification(prior_estimator=EmpiricalPrior())),
    (
        "model3",
        MeanRisk(objective_function=ObjectiveFunction.MAXIMIZE_UTILITY, min_weights=-1),
    ),
    ("model4", HierarchicalEqualRiskContribution()),
]

model_stacking = StackingOptimization(
    estimators=estimators,
    final_estimator=MeanRisk(
        objective_function=ObjectiveFunction.MAXIMIZE_UTILITY,
        risk_measure=RiskMeasure.CDAR,
    ),
)

Benchmark#

To compare the staking model, we use an equal-weighted benchmark:

benchmark = EqualWeighted()

Parameter Tuning#

To demonstrate how parameter tuning works in a staking model, we find the model parameters that maximizes the out-of-sample Calmar Ratio using GridSearchCV with WalkForward cross-validation on the training set. The WalkForward are chosen to simulate a three months (60 business days) rolling portfolio fitted on the previous year (252 business days):

cv = WalkForward(train_size=252, test_size=60)

grid_search = GridSearchCV(
    estimator=model_stacking,
    cv=cv,
    n_jobs=-1,
    param_grid={
        "model2__prior_estimator__covariance_estimator": [
            EmpiricalCovariance(),
            LedoitWolf(),
        ],
        "model3__l1_coef": [0.001, 0.1],
        "model4__risk_measure": [
            RiskMeasure.VARIANCE,
            RiskMeasure.GINI_MEAN_DIFFERENCE,
        ],
    },
    scoring=make_scorer(RatioMeasure.CALMAR_RATIO),
)
grid_search.fit(X_train)
model_stacking = grid_search.best_estimator_
print(model_stacking)
StackingOptimization(estimators=[('model1', InverseVolatility()),
                                 ('model2',
                                  MaximumDiversification(prior_estimator=EmpiricalPrior(covariance_estimator=EmpiricalCovariance()))),
                                 ('model3',
                                  MeanRisk(l1_coef=0.001, min_weights=-1,
                                           objective_function=MAXIMIZE_UTILITY)),
                                 ('model4',
                                  HierarchicalEqualRiskContribution())],
                     final_estimator=MeanRisk(objective_function=MAXIMIZE_UTILITY,
                                              risk_measure=CDaR))

Prediction#

We evaluate the Stacking model and the Benchmark using the same WalkForward object on the test set:

pred_bench = cross_val_predict(
    benchmark,
    X_test,
    cv=cv,
    portfolio_params=dict(name="Benchmark"),
)

pred_stacking = cross_val_predict(
    model_stacking,
    X_test,
    cv=cv,
    n_jobs=-1,
    portfolio_params=dict(name="Stacking"),
)

Each predicted object is a MultiPeriodPortfolio. For improved analysis, we can add them to a Population:

population = Population([pred_bench, pred_stacking])

Let’s plot the rolling portfolios cumulative returns on the test set:

population.plot_cumulative_returns()


Let’s plot the rolling portfolios compositions:

population.plot_composition(display_sub_ptf_name=False)


Analysis#

The Stacking model outperforms the Benchmark on the test set for the below ratios:

for ptf in population:
    print("=" * 25)
    print(" " * 8 + ptf.name)
    print("=" * 25)
    print(f"Sharpe ratio : {ptf.annualized_sharpe_ratio:0.2f}")
    print(f"CVaR ratio : {ptf.cdar_ratio:0.5f}")
    print(f"Calmar ratio : {ptf.calmar_ratio:0.5f}")
    print("\n")
=========================
        Benchmark
=========================
Sharpe ratio : 0.79
CVaR ratio : 0.00263
Calmar ratio : 0.00122


=========================
        Stacking
=========================
Sharpe ratio : 0.82
CVaR ratio : 0.00305
Calmar ratio : 0.00125

Let’s display the full summary:

population.summary()
Benchmark Stacking
Mean 0.051% 0.050%
Annualized Mean 12.94% 12.68%
Variance 0.011% 0.0095%
Annualized Variance 2.69% 2.39%
Semi-Variance 0.0057% 0.0052%
Annualized Semi-Variance 1.43% 1.30%
Standard Deviation 1.03% 0.97%
Annualized Standard Deviation 16.40% 15.46%
Semi-Deviation 0.75% 0.72%
Annualized Semi-Deviation 11.97% 11.39%
Mean Absolute Deviation 0.70% 0.65%
CVaR at 95% 2.45% 2.31%
EVaR at 95% 5.26% 5.24%
Worst Realization 10.72% 10.65%
CDaR at 95% 19.49% 16.49%
MAX Drawdown 42.12% 40.26%
Average Drawdown 3.83% 3.11%
EDaR at 95% 27.53% 25.88%
First Lower Partial Moment 0.35% 0.33%
Ulcer Index 0.063 0.052
Gini Mean Difference 1.05% 0.98%
Value at Risk at 95% 1.48% 1.35%
Drawdown at Risk at 95% 12.88% 9.28%
Entropic Risk Measure at 95% 3.00 3.00
Fourth Central Moment 0.000015% 0.000014%
Fourth Lower Partial Moment 0.000010% 0.000010%
Skew -48.43% -68.90%
Kurtosis 1304.03% 1604.94%
Sharpe Ratio 0.050 0.052
Annualized Sharpe Ratio 0.79 0.82
Sortino Ratio 0.068 0.070
Annualized Sortino Ratio 1.08 1.11
Mean Absolute Deviation Ratio 0.073 0.077
First Lower Partial Moment Ratio 0.15 0.15
Value at Risk Ratio at 95% 0.035 0.037
CVaR Ratio at 95% 0.021 0.022
Entropic Risk Measure Ratio at 95% 0.00017 0.00017
EVaR Ratio at 95% 0.0098 0.0096
Worst Realization Ratio 0.0048 0.0047
Drawdown at Risk Ratio at 95% 0.0040 0.0054
CDaR Ratio at 95% 0.0026 0.0031
Calmar Ratio 0.0012 0.0012
Average Drawdown Ratio 0.013 0.016
EDaR Ratio at 95% 0.0019 0.0019
Ulcer Index Ratio 0.0081 0.0096
Gini Mean Difference Ratio 0.049 0.052
Portfolios Number 45 45
Avg nb of Assets per Portfolio 64.0 64.0


Combinatorial Purged Cross-Validation#

Only using one testing path (the historical path) may not be enough for comparing both models. For a more robust analysis, we can use the CombinatorialPurgedCV to create multiple testing paths from different training folds combinations.

We choose n_folds and n_test_folds to obtain around 170 test paths and an average training size of 252 days:

n_folds, n_test_folds = optimal_folds_number(
    n_observations=X_test.shape[0],
    target_n_test_paths=170,
    target_train_size=252,
)

cv = CombinatorialPurgedCV(n_folds=n_folds, n_test_folds=n_test_folds)
cv.summary(X_test)
Number of Observations             2980
Total Number of Folds                20
Number of Test Folds                 18
Purge Size                            0
Embargo Size                          0
Average Training Size               298
Number of Test Paths                171
Number of Training Combinations     190
dtype: int64
pred_stacking = cross_val_predict(
    model_stacking,
    X_test,
    cv=cv,
    n_jobs=-1,
    portfolio_params=dict(tag="Stacking"),
)

The predicted object is a Population of MultiPeriodPortfolio. Each MultiPeriodPortfolio represents one test path of a rolling portfolio.

Distribution#

Let’s plot the out-of-sample distribution of Sharpe Ratio for the Stacking model:

pred_stacking.plot_distribution(
    measure_list=[RatioMeasure.ANNUALIZED_SHARPE_RATIO], n_bins=40
)


print(
    "Average of Sharpe Ratio :"
    f" {pred_stacking.measures_mean(measure=RatioMeasure.ANNUALIZED_SHARPE_RATIO):0.2f}"
)
print(
    "Std of Sharpe Ratio :"
    f" {pred_stacking.measures_std(measure=RatioMeasure.ANNUALIZED_SHARPE_RATIO):0.2f}"
)
Average of Sharpe Ratio : 0.84
Std of Sharpe Ratio : 0.10

Now, let’s analyze how the sub-models would have performed independently and compare their distribution with the Stacking model:

population = Population([])
for model_name, model in model_stacking.estimators:
    pred = cross_val_predict(
        model,
        X_test,
        cv=cv,
        n_jobs=-1,
        portfolio_params=dict(tag=model_name),
    )
    population.extend(pred)
population.extend(pred_stacking)

fig = population.plot_distribution(
    measure_list=[RatioMeasure.ANNUALIZED_SHARPE_RATIO],
    n_bins=40,
    tag_list=["Stacking", "model1", "model2", "model3", "model4"],
)
show(fig)

Conclusion#

The Stacking model outperforms the Benchmark on the historical test set. The distribution analysis on the recombined (non-historical) test sets shows that the Stacking model continues to outperform the Benchmark in average.

Total running time of the script: (2 minutes 4.990 seconds)

Gallery generated by Sphinx-Gallery