Note
Go to the end to download the full example code. or to run this example in your browser via Binder
Drop Highly Correlated Assets#
This tutorial introduces the pre-selection transformers
DropCorrelated
to remove highly correlated assets before
the optimization.
Highly correlated assets tend to increase the instability of mean-variance optimization.
In this example, we will compare a mean-variance optimization with and without pre-selection.
Data#
We load the FTSE 100 dataset composed of the daily prices of 64 assets from the FTSE 100 Index composition starting from 2000-01-04 up to 2023-05-31:
from plotly.io import show
from sklearn import set_config
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from skfolio import Population, RatioMeasure
from skfolio.datasets import load_ftse100_dataset
from skfolio.model_selection import (
CombinatorialPurgedCV,
cross_val_predict,
optimal_folds_number,
)
from skfolio.optimization import MeanRisk, ObjectiveFunction
from skfolio.pre_selection import DropCorrelated
from skfolio.preprocessing import prices_to_returns
prices = load_ftse100_dataset()
X = prices_to_returns(prices)
X_train, X_test = train_test_split(X, test_size=0.33, shuffle=False)
Model#
First, we create a maximum Sharpe Ratio model without pre-selection and fit it on the training set:
model1 = MeanRisk(objective_function=ObjectiveFunction.MAXIMIZE_RATIO)
model1.fit(X_train)
model1.weights_
array([1.11484352e-11, 6.92866673e-02, 2.99620661e-02, 1.10758846e-10,
1.16254780e-11, 4.14964179e-11, 2.84069930e-11, 1.22736942e-11,
1.38469617e-01, 4.65046123e-04, 2.67826275e-10, 4.03454161e-11,
8.79558938e-03, 1.85405564e-11, 1.75767841e-11, 5.92183632e-11,
8.30675668e-02, 6.89279675e-11, 2.31563874e-11, 2.74514930e-11,
3.13808863e-02, 1.84375687e-11, 3.89248471e-02, 1.30775206e-11,
1.08797657e-01, 2.02346903e-11, 1.81108516e-01, 3.40116574e-11,
1.18849904e-11, 3.64862543e-11, 2.09437885e-11, 9.84526736e-12,
5.12587013e-10, 8.82872903e-12, 8.41661311e-02, 1.27314017e-11,
6.20495386e-03, 2.09702783e-11, 3.38535227e-11, 1.68752474e-11,
1.12489362e-01, 3.53621465e-11, 1.48148340e-11, 1.63374918e-11,
1.94513306e-11, 2.11982861e-11, 1.84887386e-11, 9.74395933e-11,
1.79633825e-11, 2.74523289e-11, 3.80823150e-03, 8.28133405e-02,
2.98247367e-03, 1.54391616e-11, 3.96913808e-11, 1.72770463e-02,
2.20575709e-11, 2.42859778e-11, 4.36110766e-11, 6.52209193e-11,
3.12158920e-11, 1.00965673e-10, 1.80664221e-11, 1.14323018e-10])
Pipeline#
Then, we create a maximum Sharpe ratio model with pre-selection using Pipepline
and
fit it on the training set:
set_config(transform_output="pandas")
model2 = Pipeline(
[
("pre_selection", DropCorrelated(threshold=0.5)),
("optimization", MeanRisk(objective_function=ObjectiveFunction.MAXIMIZE_RATIO)),
]
)
model2.fit(X_train)
model2.named_steps["optimization"].weights_
array([8.17183301e-02, 2.99627408e-02, 5.75907600e-11, 2.59687833e-11,
1.82564384e-01, 2.92880653e-03, 1.52217700e-10, 1.13239659e-02,
1.76926114e-11, 1.62035863e-11, 4.94836568e-11, 8.20549958e-02,
6.08919173e-11, 2.64696218e-11, 3.58312606e-02, 4.20210608e-02,
1.25426519e-11, 1.83697624e-11, 1.84916828e-01, 2.91846109e-11,
1.96049980e-11, 1.05201907e-03, 8.54482537e-12, 8.99284245e-02,
1.91471166e-11, 2.98391244e-11, 1.23117943e-01, 2.98558904e-11,
1.40973224e-11, 1.60306246e-11, 1.91741539e-11, 1.73658093e-11,
5.65079997e-11, 1.67279164e-11, 8.53270264e-03, 8.56311233e-02,
1.32678922e-02, 1.48213167e-11, 3.17212380e-11, 2.51475217e-02,
2.12051211e-11, 2.42766417e-11, 4.17359260e-11, 2.77319466e-11,
5.40287802e-11, 7.07120393e-11])
Prediction#
We predict both models on the test set:
ptf1 = model1.predict(X_test)
ptf1.name = "model1"
ptf2 = model2.predict(X_test)
ptf2.name = "model2"
print(ptf1.n_assets)
print(ptf2.n_assets)
64
46
Each predicted object is a MultiPeriodPortfolio
.
For improved analysis, we can add them to a Population
:
population = Population([ptf1, ptf2])
Let’s plot the portfolios cumulative returns on the test set:
population.plot_cumulative_returns()
Combinatorial Purged Cross-Validation#
Only using one testing path (the historical path) may not be enough for comparing both
models. For a more robust analysis, we can use the
CombinatorialPurgedCV
to create multiple testing
paths from different training folds combinations.
We choose n_folds
and n_test_folds
to obtain around 100 test paths and an average
training size of 800 days:
n_folds, n_test_folds = optimal_folds_number(
n_observations=X_test.shape[0],
target_n_test_paths=100,
target_train_size=800,
)
cv = CombinatorialPurgedCV(n_folds=n_folds, n_test_folds=n_test_folds)
cv.summary(X_test)
Number of Observations 1967
Total Number of Folds 10
Number of Test Folds 6
Purge Size 0
Embargo Size 0
Average Training Size 786
Number of Test Paths 126
Number of Training Combinations 210
dtype: int64
pred_1 = cross_val_predict(
model1,
X_test,
cv=cv,
n_jobs=-1,
portfolio_params=dict(annualized_factor=252, tag="model1"),
)
pred_2 = cross_val_predict(
model2,
X_test,
cv=cv,
n_jobs=-1,
portfolio_params=dict(annualized_factor=252, tag="model2"),
)
The predicted object is a Population
of MultiPeriodPortfolio
. Each
MultiPeriodPortfolio
represents one testing path of a rolling portfolio.
For improved analysis, we can merge the populations of each model:
population = pred_1 + pred_2
Distribution#
We plot the out-of-sample distribution of Sharpe ratio for both models:
fig = population.plot_distribution(
measure_list=[RatioMeasure.SHARPE_RATIO], tag_list=["model1", "model2"], n_bins=40
)
show(fig)
Model 1:
print(
"Average of Sharpe Ratio:"
f" {pred_1.measures_mean(measure=RatioMeasure.ANNUALIZED_SHARPE_RATIO):0.2f}"
)
print(
"Std of Sharpe Ratio:"
f" {pred_1.measures_std(measure=RatioMeasure.ANNUALIZED_SHARPE_RATIO):0.2f}"
)
Average of Sharpe Ratio: 0.46
Std of Sharpe Ratio: 0.20
Model 2:
print(
"Average of Sharpe Ratio:"
f" {pred_2.measures_mean(measure=RatioMeasure.ANNUALIZED_SHARPE_RATIO):0.2f}"
)
print(
"Std of Sharpe Ratio:"
f" {pred_2.measures_std(measure=RatioMeasure.ANNUALIZED_SHARPE_RATIO):0.2f}"
)
Average of Sharpe Ratio: 0.51
Std of Sharpe Ratio: 0.21
Total running time of the script: (0 minutes 5.738 seconds)