Note
Go to the end to download the full example code. or to run this example in your browser via JupyterLite or Binder
Drop Highly Correlated Assets#
This tutorial introduces the pre-selection transformers
DropCorrelated
to remove highly correlated assets before
the optimization.
Highly correlated assets tend to increase the instability of mean-variance optimization.
In this example, we will compare a mean-variance optimization with and without pre-selection.
Data#
We load the FTSE 100 dataset composed of the daily prices of 64 assets from the FTSE 100 Index composition starting from 2000-01-04 up to 2023-05-31:
from plotly.io import show
from sklearn import set_config
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from skfolio import Population, RatioMeasure
from skfolio.datasets import load_ftse100_dataset
from skfolio.model_selection import (
CombinatorialPurgedCV,
cross_val_predict,
optimal_folds_number,
)
from skfolio.optimization import MeanRisk, ObjectiveFunction
from skfolio.pre_selection import DropCorrelated
from skfolio.preprocessing import prices_to_returns
prices = load_ftse100_dataset()
X = prices_to_returns(prices)
X_train, X_test = train_test_split(X, test_size=0.33, shuffle=False)
Model#
First, we create a maximum Sharpe Ratio model without pre-selection and fit it on the training set:
model1 = MeanRisk(objective_function=ObjectiveFunction.MAXIMIZE_RATIO)
model1.fit(X_train)
model1.weights_
array([5.72489247e-08, 6.92799374e-02, 2.99565070e-02, 5.15427367e-07,
5.98248542e-08, 2.11954169e-07, 1.45472664e-07, 6.30531933e-08,
1.38474971e-01, 5.39755883e-04, 1.19668649e-06, 2.04407314e-07,
8.79495071e-03, 9.56173665e-08, 9.05345542e-08, 3.00727418e-07,
8.30609346e-02, 3.48492426e-07, 1.18995204e-07, 1.41496349e-07,
3.13544690e-02, 9.49328188e-08, 3.89233565e-02, 6.73529746e-08,
1.08791263e-01, 1.03983576e-07, 1.81104294e-01, 1.71678031e-07,
6.17026726e-08, 1.85876177e-07, 1.06482315e-07, 5.09757679e-08,
1.97934913e-06, 4.57814906e-08, 8.41631508e-02, 6.52279083e-08,
6.16740873e-03, 1.07868906e-07, 1.72699953e-07, 8.59571972e-08,
1.12484994e-01, 1.77846764e-07, 7.68211297e-08, 8.46476487e-08,
9.91391602e-08, 1.08820473e-07, 9.52131430e-08, 4.71020368e-07,
9.27208518e-08, 1.40414869e-07, 3.82148954e-03, 8.28101461e-02,
3.04499311e-03, 7.93434219e-08, 1.98631697e-07, 1.72168221e-02,
1.14241628e-07, 1.24128911e-07, 2.22757657e-07, 3.33244835e-07,
1.59411484e-07, 4.79955612e-07, 9.26676073e-08, 5.55225622e-07])
Pipeline#
Then, we create a maximum Sharpe ratio model with pre-selection using Pipepline
and
fit it on the training set:
set_config(transform_output="pandas")
model2 = Pipeline(
[
("pre_selection", DropCorrelated(threshold=0.5)),
("optimization", MeanRisk(objective_function=ObjectiveFunction.MAXIMIZE_RATIO)),
]
)
model2.fit(X_train)
model2.named_steps["optimization"].weights_
array([8.18629046e-02, 2.99990921e-02, 1.16397541e-06, 3.33347825e-07,
1.82548038e-01, 2.85482485e-03, 3.56265072e-06, 1.10513944e-02,
2.32253197e-07, 2.05141268e-07, 7.34710301e-07, 8.21495217e-02,
9.52097597e-07, 3.41747905e-07, 3.58028595e-02, 4.20420351e-02,
1.56245104e-07, 2.35646825e-07, 1.85053358e-01, 3.80020780e-07,
2.42398710e-07, 1.15032452e-03, 1.09539530e-07, 9.00503120e-02,
2.41991003e-07, 3.83742726e-07, 1.23265556e-01, 3.91693342e-07,
1.83251807e-07, 2.09337576e-07, 2.38594386e-07, 2.26123969e-07,
1.11576161e-06, 2.18898632e-07, 8.33465412e-03, 8.57587244e-02,
1.29547016e-02, 1.87531533e-07, 4.25937609e-07, 2.51052265e-02,
2.73616225e-07, 3.03373275e-07, 5.56618079e-07, 3.50951178e-07,
1.05787166e-06, 1.45723457e-06])
Prediction#
We predict both models on the test set:
ptf1 = model1.predict(X_test)
ptf1.name = "model1"
ptf2 = model2.predict(X_test)
ptf2.name = "model2"
print(ptf1.n_assets)
print(ptf2.n_assets)
64
46
Each predicted object is a MultiPeriodPortfolio
.
For improved analysis, we can add them to a Population
:
population = Population([ptf1, ptf2])
Let’s plot the portfolios cumulative returns on the test set:
population.plot_cumulative_returns()
Combinatorial Purged Cross-Validation#
Only using one testing path (the historical path) may not be enough for comparing both
models. For a more robust analysis, we can use the
CombinatorialPurgedCV
to create multiple testing
paths from different training folds combinations.
We choose n_folds
and n_test_folds
to obtain around 100 test paths and an average
training size of 800 days:
n_folds, n_test_folds = optimal_folds_number(
n_observations=X_test.shape[0],
target_n_test_paths=100,
target_train_size=800,
)
cv = CombinatorialPurgedCV(n_folds=n_folds, n_test_folds=n_test_folds)
cv.summary(X_test)
Number of Observations 1967
Total Number of Folds 10
Number of Test Folds 6
Purge Size 0
Embargo Size 0
Average Training Size 786
Number of Test Paths 126
Number of Training Combinations 210
dtype: int64
pred_1 = cross_val_predict(
model1,
X_test,
cv=cv,
n_jobs=-1,
portfolio_params=dict(annualized_factor=252, tag="model1"),
)
pred_2 = cross_val_predict(
model2,
X_test,
cv=cv,
n_jobs=-1,
portfolio_params=dict(annualized_factor=252, tag="model2"),
)
The predicted object is a Population
of MultiPeriodPortfolio
. Each
MultiPeriodPortfolio
represents one testing path of a rolling portfolio.
For improved analysis, we can merge the populations of each model:
population = pred_1 + pred_2
Distribution#
We plot the out-of-sample distribution of Sharpe ratio for both models:
fig = population.plot_distribution(
measure_list=[RatioMeasure.SHARPE_RATIO], tag_list=["model1", "model2"], n_bins=40
)
show(fig)
Model 1:
print(
"Average of Sharpe Ratio:"
f" {pred_1.measures_mean(measure=RatioMeasure.ANNUALIZED_SHARPE_RATIO):0.2f}"
)
print(
"Std of Sharpe Ratio:"
f" {pred_1.measures_std(measure=RatioMeasure.ANNUALIZED_SHARPE_RATIO):0.2f}"
)
Average of Sharpe Ratio: 0.46
Std of Sharpe Ratio: 0.20
Model 2:
print(
"Average of Sharpe Ratio:"
f" {pred_2.measures_mean(measure=RatioMeasure.ANNUALIZED_SHARPE_RATIO):0.2f}"
)
print(
"Std of Sharpe Ratio:"
f" {pred_2.measures_std(measure=RatioMeasure.ANNUALIZED_SHARPE_RATIO):0.2f}"
)
Average of Sharpe Ratio: 0.51
Std of Sharpe Ratio: 0.21
Total running time of the script: (0 minutes 5.559 seconds)