Note

Go to the end to download the full example code. or to run this example in your browser via JupyterLite or Binder

Uncertainty Set#

This tutorial shows how to incorporate expected returns uncertainty sets into the MeanRisk optimization.

By using the Mu Uncertainty set estimator, the assets expected returns are modelled with an ellipsoidal uncertainty set. This approach, known as worst-case optimization, falls under the umbrella of robust optimization. It reduces the instability that arises from the estimation errors of the expected returns.

The worst case portfolio expect return is:

\[w^T\hat{\mu} - \kappa_{\mu}\lVert S_{\mu}^\frac{1}{2}w\rVert_{2}\]

with \(\kappa\) the size of the ellipsoid (confidence region) and \(S\) its shape.

In this example, we will use a Mean-CVaR model with an EmpiricalMuUncertaintySet estimator.

Note that other uncertainty set can be used, for example: BootstrapMuUncertaintySet.

Data#

We load the FTSE 100 dataset composed of the daily prices of 64 assets from the FTSE 100 Index composition starting from 2000-01-04 up to 2023-05-31:

import numpy as np
import plotly.graph_objects as go
from plotly.io import show
from scipy.stats import uniform
from sklearn import clone
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, train_test_split

from skfolio import PerfMeasure, Population, RatioMeasure, RiskMeasure
from skfolio.datasets import load_ftse100_dataset
from skfolio.metrics import make_scorer
from skfolio.model_selection import WalkForward, cross_val_predict
from skfolio.optimization import MeanRisk, ObjectiveFunction
from skfolio.preprocessing import prices_to_returns
from skfolio.uncertainty_set import EmpiricalMuUncertaintySet

prices = load_ftse100_dataset()

X = prices_to_returns(prices)
X_train, X_test = train_test_split(X, test_size=0.33, shuffle=False)

Efficient Frontier#

First, we create a Mean-CVaR model to estimate the efficient frontier without uncertainty set. We constrain the CVaR at 95% to be below 2% (representing the average loss of the worst 5% daily returns over the period):

model = MeanRisk(
    risk_measure=RiskMeasure.CVAR,
    min_weights=-1,
    max_cvar=0.02,
    efficient_frontier_size=20,
    portfolio_params=dict(name="Mean-CVaR", tag="No Uncertainty Set"),
)
model.fit(X_train)
model.weights_.shape

(20, 64)

Now, we create a robust (worst case) Mean-CVaR model with an uncertainty set on the expected returns:

model_uncertainty = MeanRisk(
    risk_measure=RiskMeasure.CVAR,
    min_weights=-1,
    max_cvar=0.02,
    efficient_frontier_size=20,
    mu_uncertainty_set_estimator=EmpiricalMuUncertaintySet(confidence_level=0.60),
    portfolio_params=dict(name="Mean-CVaR", tag="Mu Uncertainty Set - 60%"),
)
model_uncertainty.fit(X_train)
model_uncertainty.weights_.shape

(20, 64)

Let’s plot both efficient frontiers on the training set:

population_train = model.predict(X_train) + model_uncertainty.predict(X_train)

population_train.plot_measures(
    x=RiskMeasure.CVAR,
    y=PerfMeasure.ANNUALIZED_MEAN,
    color_scale=RatioMeasure.ANNUALIZED_SHARPE_RATIO,
    hover_measures=[RiskMeasure.MAX_DRAWDOWN, RatioMeasure.ANNUALIZED_SORTINO_RATIO],
)

Hyper-Parameter Tuning#

In this section, we consider a 3 months rolling (60 business days) long-short allocation fitted on the preceding year of data (252 business days) that maximizes the portfolio return under a CVaR constraint. We will use GridSearchCV to select the below model parameters on the training set using walk forward analysis with a Mean/CVaR ratio scoring.

The model parameters to tune are:

max_cvar: CVaR target (upper constraint)

cvar_beta: CVaR confidence level

confidence_level: Mu uncertainty set confidence level of the EmpiricalMuUncertaintySet

For embedded parameters in the GridSearchCV, you need to use a double underscore: mu_uncertainty_set_estimator__confidence_level

model_no_uncertainty = MeanRisk(
    risk_measure=RiskMeasure.CVAR,
    objective_function=ObjectiveFunction.MAXIMIZE_RETURN,
    max_cvar=0.02,
    cvar_beta=0.9,
    min_weights=-1,
)

model_uncertainty = clone(model_no_uncertainty)
model_uncertainty.set_params(mu_uncertainty_set_estimator=EmpiricalMuUncertaintySet())

cv = WalkForward(train_size=252, test_size=60)

grid_search = GridSearchCV(
    estimator=model_uncertainty,
    cv=cv,
    n_jobs=-1,
    param_grid={
        "mu_uncertainty_set_estimator__confidence_level": [0.80, 0.90],
        "max_cvar": [0.03, 0.04, 0.05],
        "cvar_beta": [0.8, 0.9, 0.95],
    },
    scoring=make_scorer(RatioMeasure.CVAR_RATIO),
)
grid_search.fit(X_train)
best_model = grid_search.best_estimator_
print(best_model)

MeanRisk(cvar_beta=0.9, max_cvar=0.03, min_weights=-1,
         mu_uncertainty_set_estimator=EmpiricalMuUncertaintySet(confidence_level=0.8),
         objective_function=MAXIMIZE_RETURN, risk_measure=CVaR)

The optimal parameters among the above 2x3x3 grid are the max_cvar=3%, cvar_beta=90% and EmpiricalMuUncertaintySet confidence_level=80%. These parameters are the ones that achieved the highest mean out-of-sample Mean/CVaR ratio.

For continuous parameters, such as confidence_level, a better approach is to use RandomizedSearchCV and specify a continuous distribution to take full advantage of the randomization. We specify a continuous random variable that is uniformly distributed between 0 and 1:

randomized_search = RandomizedSearchCV(
    estimator=model_uncertainty,
    cv=cv,
    n_jobs=-1,
    param_distributions={
        "mu_uncertainty_set_estimator__confidence_level": uniform(loc=0, scale=1),
    },
    n_iter=50,
    scoring=make_scorer(RatioMeasure.CVAR_RATIO),
)
randomized_search.fit(X_train)
best_model_rs = randomized_search.best_estimator_

The selected confidence level is 58%.

Let’s plot the average out-of-sample score (CVaR ratio) as a function of the uncertainty set confidence level:

cv_results = randomized_search.cv_results_
x = np.asarray(
    cv_results["param_mu_uncertainty_set_estimator__confidence_level"]
).astype(float)
sort_idx = np.argsort(x)
y_test_mean = cv_results["mean_test_score"][sort_idx]
x = x[sort_idx]

fig = go.Figure(
    [
        go.Scatter(
            x=x,
            y=y_test_mean,
            name="Test",
            mode="lines",
            line=dict(color="rgb(255,165,0)"),
        ),
    ]
)
fig.add_vline(
    x=randomized_search.best_params_["mu_uncertainty_set_estimator__confidence_level"],
    line_width=2,
    line_dash="dash",
    line_color="green",
)
fig.update_layout(
    title="Test score",
    xaxis_title="Uncertainty Set Confidence Level",
    yaxis_title="CVaR Ratio",
)
fig.update_yaxes(tickformat=".3f")
fig.update_xaxes(tickformat=".0%")
show(fig)

Now, we analyze all three models on the test set. By using cross_val_predict with WalkForward, we are able to compute efficiently the MultiPeriodPortfolio composed of 60 days rolling portfolios fitted on the preceding 252 days:

pred_no_uncertainty = cross_val_predict(model_no_uncertainty, X_test, cv=cv)
pred_no_uncertainty.name = "No Uncertainty set"

pred_uncertainty = cross_val_predict(best_model, X_test, cv=cv, n_jobs=-1)
pred_uncertainty.name = "Uncertainty set - Grid Search"

pred_uncertainty_rs = cross_val_predict(best_model_rs, X_test, cv=cv, n_jobs=-1)
pred_uncertainty_rs.name = "Uncertainty set - Randomized Search"

population = Population([pred_no_uncertainty, pred_uncertainty, pred_uncertainty_rs])
population.plot_cumulative_returns()

From the plot and the below summary, we can see that the model without uncertainty set is overfitted and perform poorly on the test set. Its CVaR at 95% is 10% and its Mean/CVaR ratio is 0.006 which is the lowest of all models.

population.summary()

	No Uncertainty set	Uncertainty set - Grid Search	Uncertainty set - Randomized Search
Mean	0.074%	0.031%	0.024%
Annualized Mean	18.72%	7.83%	6.17%
Variance	0.21%	0.0099%	0.010%
Annualized Variance	53.11%	2.50%	2.58%
Semi-Variance	0.11%	0.0055%	0.0057%
Annualized Semi-Variance	27.86%	1.39%	1.44%
Standard Deviation	4.59%	1.00%	1.01%
Annualized Standard Deviation	72.88%	15.80%	16.07%
Semi-Deviation	3.32%	0.74%	0.76%
Annualized Semi-Deviation	52.78%	11.81%	12.01%
Mean Absolute Deviation	3.38%	0.69%	0.70%
CVaR at 95%	10.76%	2.45%	2.49%
EVaR at 95%	18.51%	4.85%	4.82%
Worst Realization	34.49%	9.22%	9.10%
CDaR at 95%	97.76%	18.91%	18.98%
MAX Drawdown	107.74%	39.30%	39.37%
Average Drawdown	46.40%	4.92%	5.64%
EDaR at 95%	99.80%	26.38%	26.53%
First Lower Partial Moment	1.69%	0.34%	0.35%
Ulcer Index	0.56	0.069	0.074
Gini Mean Difference	4.93%	1.01%	1.03%
Value at Risk at 95%	7.56%	1.50%	1.53%
Drawdown at Risk at 95%	91.92%	13.89%	14.12%
Entropic Risk Measure at 95%	3.00	3.00	3.00
Fourth Central Moment	0.0029%	0.000013%	0.000014%
Fourth Lower Partial Moment	0.0019%	0.000009%	0.000009%
Skew	-32.05%	-63.45%	-62.42%
Kurtosis	655.19%	1343.28%	1287.71%
Sharpe Ratio	0.016	0.031	0.024
Annualized Sharpe Ratio	0.26	0.50	0.38
Sortino Ratio	0.022	0.042	0.032
Annualized Sortino Ratio	0.35	0.66	0.51
Mean Absolute Deviation Ratio	0.022	0.045	0.035
First Lower Partial Moment Ratio	0.044	0.090	0.070
Value at Risk Ratio at 95%	0.0098	0.021	0.016
CVaR Ratio at 95%	0.0069	0.013	0.0098
Entropic Risk Measure Ratio at 95%	0.00025	0.00010	0.000082
EVaR Ratio at 95%	0.0040	0.0064	0.0051
Worst Realization Ratio	0.0022	0.0034	0.0027
Drawdown at Risk Ratio at 95%	0.00081	0.0022	0.0017
CDaR Ratio at 95%	0.00076	0.0016	0.0013
Calmar Ratio	0.00069	0.00079	0.00062
Average Drawdown Ratio	0.0016	0.0063	0.0043
EDaR Ratio at 95%	0.00074	0.0012	0.00092
Ulcer Index Ratio	0.0013	0.0045	0.0033
Gini Mean Difference Ratio	0.015	0.031	0.024
Portfolios Number	28	28	28
Avg nb of Assets per Portfolio	64.0	64.0	64.0

Finally, let’s plot the composition of the regularized multi-period portfolio:

pred_uncertainty.plot_composition()

Total running time of the script: (3 minutes 8.901 seconds)

Gallery generated by Sphinx-Gallery