Note

Go to the end to download the full example code. or to run this example in your browser via Binder

# Stacking Optimization#

This tutorial introduces the `StackingOptimization`

.

Stacking Optimization is an ensemble method that consists in stacking the output of individual portfolio optimizations with a final portfolio optimization.

The weights are the dot-product of individual optimizations weights with the final optimization weights.

Stacking allows to use the strength of each individual portfolio optimization by using their output as input of a final portfolio optimization.

To avoid data leakage, out-of-sample estimates are used to fit the outer optimization.

Note

The `estimators_`

are fitted on the full `X`

while `final_estimator_`

is trained
using cross-validated predictions of the base estimators using `cross_val_predict`

.

## Data#

We load the FTSE 100 dataset. This dataset is composed of the daily prices of 64 assets from the FTSE 100 Index composition starting from 2000-01-04 up to 2023-05-31:

```
from plotly.io import show
from sklearn.model_selection import GridSearchCV, train_test_split
from skfolio import Population, RatioMeasure, RiskMeasure
from skfolio.datasets import load_ftse100_dataset
from skfolio.metrics import make_scorer
from skfolio.model_selection import (
CombinatorialPurgedCV,
WalkForward,
cross_val_predict,
optimal_folds_number,
)
from skfolio.moments import EmpiricalCovariance, LedoitWolf
from skfolio.optimization import (
EqualWeighted,
HierarchicalEqualRiskContribution,
InverseVolatility,
MaximumDiversification,
MeanRisk,
ObjectiveFunction,
StackingOptimization,
)
from skfolio.preprocessing import prices_to_returns
from skfolio.prior import EmpiricalPrior
prices = load_ftse100_dataset()
X = prices_to_returns(prices)
X_train, X_test = train_test_split(X, test_size=0.50, shuffle=False)
```

## Stacking Model#

- Our stacking model will be composed of 4 models:
Inverse Volatility

Maximum Diversification

Maximum Mean-Risk Utility allowing short position with L1 regularization

Hierarchical Equal Risk Contribution

We will stack these 4 models together using the Mean-CDaR utility maximization:

```
estimators = [
("model1", InverseVolatility()),
("model2", MaximumDiversification(prior_estimator=EmpiricalPrior())),
(
"model3",
MeanRisk(objective_function=ObjectiveFunction.MAXIMIZE_UTILITY, min_weights=-1),
),
("model4", HierarchicalEqualRiskContribution()),
]
model_stacking = StackingOptimization(
estimators=estimators,
final_estimator=MeanRisk(
objective_function=ObjectiveFunction.MAXIMIZE_UTILITY,
risk_measure=RiskMeasure.CDAR,
),
)
```

## Benchmark#

To compare the staking model, we use an equal-weighted benchmark:

```
benchmark = EqualWeighted()
```

## Parameter Tuning#

To demonstrate how parameter tuning works in a staking model, we find the model
parameters that maximizes the out-of-sample Calmar Ratio using `GridSearchCV`

with
`WalkForward`

cross-validation on the training set.
The `WalkForward`

are chosen to simulate a three months (60 business days) rolling
portfolio fitted on the previous year (252 business days):

```
cv = WalkForward(train_size=252, test_size=60)
grid_search = GridSearchCV(
estimator=model_stacking,
cv=cv,
n_jobs=-1,
param_grid={
"model2__prior_estimator__covariance_estimator": [
EmpiricalCovariance(),
LedoitWolf(),
],
"model3__l1_coef": [0.001, 0.1],
"model4__risk_measure": [
RiskMeasure.VARIANCE,
RiskMeasure.GINI_MEAN_DIFFERENCE,
],
},
scoring=make_scorer(RatioMeasure.CALMAR_RATIO),
)
grid_search.fit(X_train)
model_stacking = grid_search.best_estimator_
print(model_stacking)
```

```
StackingOptimization(estimators=[('model1', InverseVolatility()),
('model2',
MaximumDiversification(prior_estimator=EmpiricalPrior(covariance_estimator=EmpiricalCovariance()))),
('model3',
MeanRisk(l1_coef=0.001, min_weights=-1,
objective_function=MAXIMIZE_UTILITY)),
('model4',
HierarchicalEqualRiskContribution())],
final_estimator=MeanRisk(objective_function=MAXIMIZE_UTILITY,
risk_measure=CDaR))
```

## Prediction#

We evaluate the Stacking model and the Benchmark using the same `WalkForward`

object
on the test set:

```
pred_bench = cross_val_predict(
benchmark,
X_test,
cv=cv,
portfolio_params=dict(name="Benchmark"),
)
pred_stacking = cross_val_predict(
model_stacking,
X_test,
cv=cv,
n_jobs=-1,
portfolio_params=dict(name="Stacking"),
)
```

Each predicted object is a `MultiPeriodPortfolio`

.
For improved analysis, we can add them to a `Population`

:

```
population = Population([pred_bench, pred_stacking])
```

Let’s plot the rolling portfolios cumulative returns on the test set:

```
population.plot_cumulative_returns()
```

Let’s plot the rolling portfolios compositions:

```
population.plot_composition(display_sub_ptf_name=False)
```

## Analysis#

The Stacking model outperforms the Benchmark on the test set for the below ratios:

```
for ptf in population:
print("=" * 25)
print(" " * 8 + ptf.name)
print("=" * 25)
print(f"Sharpe ratio : {ptf.annualized_sharpe_ratio:0.2f}")
print(f"CVaR ratio : {ptf.cdar_ratio:0.5f}")
print(f"Calmar ratio : {ptf.calmar_ratio:0.5f}")
print("\n")
```

```
=========================
Benchmark
=========================
Sharpe ratio : 0.79
CVaR ratio : 0.00263
Calmar ratio : 0.00122
=========================
Stacking
=========================
Sharpe ratio : 0.82
CVaR ratio : 0.00305
Calmar ratio : 0.00125
```

Let’s display the full summary:

```
population.summary()
```

## Combinatorial Purged Cross-Validation#

Only using one testing path (the historical path) may not be enough for comparing both
models. For a more robust analysis, we can use the
`CombinatorialPurgedCV`

to create multiple testing
paths from different training folds combinations.

We choose `n_folds`

and `n_test_folds`

to obtain around 170 test paths and an average
training size of 252 days:

```
n_folds, n_test_folds = optimal_folds_number(
n_observations=X_test.shape[0],
target_n_test_paths=170,
target_train_size=252,
)
cv = CombinatorialPurgedCV(n_folds=n_folds, n_test_folds=n_test_folds)
cv.summary(X_test)
```

```
Number of Observations 2980
Total Number of Folds 20
Number of Test Folds 18
Purge Size 0
Embargo Size 0
Average Training Size 298
Number of Test Paths 171
Number of Training Combinations 190
dtype: int64
```

```
pred_stacking = cross_val_predict(
model_stacking,
X_test,
cv=cv,
n_jobs=-1,
portfolio_params=dict(tag="Stacking"),
)
```

The predicted object is a `Population`

of `MultiPeriodPortfolio`

. Each
`MultiPeriodPortfolio`

represents one test path of a rolling portfolio.

## Distribution#

Let’s plot the out-of-sample distribution of Sharpe Ratio for the Stacking model:

```
pred_stacking.plot_distribution(
measure_list=[RatioMeasure.ANNUALIZED_SHARPE_RATIO], n_bins=40
)
```

```
print(
"Average of Sharpe Ratio :"
f" {pred_stacking.measures_mean(measure=RatioMeasure.ANNUALIZED_SHARPE_RATIO):0.2f}"
)
print(
"Std of Sharpe Ratio :"
f" {pred_stacking.measures_std(measure=RatioMeasure.ANNUALIZED_SHARPE_RATIO):0.2f}"
)
```

```
Average of Sharpe Ratio : 0.84
Std of Sharpe Ratio : 0.10
```

Now, let’s analyze how the sub-models would have performed independently and compare their distribution with the Stacking model:

```
population = Population([])
for model_name, model in model_stacking.estimators:
pred = cross_val_predict(
model,
X_test,
cv=cv,
n_jobs=-1,
portfolio_params=dict(tag=model_name),
)
population.extend(pred)
population.extend(pred_stacking)
fig = population.plot_distribution(
measure_list=[RatioMeasure.ANNUALIZED_SHARPE_RATIO],
n_bins=40,
tag_list=["Stacking", "model1", "model2", "model3", "model4"],
)
show(fig)
```

## Conclusion#

The Stacking model outperforms the Benchmark on the historical test set. The distribution analysis on the recombined (non-historical) test sets shows that the Stacking model continues to outperform the Benchmark in average.

**Total running time of the script:** (2 minutes 6.686 seconds)