Uncertainty Set#

This tutorial shows how to incorporate expected returns uncertainty sets into the MeanRisk optimization.

By using the Mu Uncertainty set estimator, the assets expected returns are modelled with an ellipsoidal uncertainty set. This approach, known as worst-case optimization, falls under the umbrella of robust optimization. It reduces the instability that arises from the estimation errors of the expected returns.

The worst case portfolio expect return is:

\[w^T\hat{\mu} - \kappa_{\mu}\lVert S_{\mu}^\frac{1}{2}w\rVert_{2}\]

with \(\kappa\) the size of the ellipsoid (confidence region) and \(S\) its shape.

In this example, we will use a Mean-CVaR model with an EmpiricalMuUncertaintySet estimator.

Note that other uncertainty set can be used, for example: BootstrapMuUncertaintySet.

Data#

We load the FTSE 100 dataset composed of the daily prices of 64 assets from the FTSE 100 Index composition starting from 2000-01-04 up to 2023-05-31:

import numpy as np
import plotly.graph_objects as go
from plotly.io import show
from scipy.stats import uniform
from sklearn import clone
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, train_test_split

from skfolio import PerfMeasure, Population, RatioMeasure, RiskMeasure
from skfolio.datasets import load_ftse100_dataset
from skfolio.metrics import make_scorer
from skfolio.model_selection import WalkForward, cross_val_predict
from skfolio.optimization import MeanRisk, ObjectiveFunction
from skfolio.preprocessing import prices_to_returns
from skfolio.uncertainty_set import EmpiricalMuUncertaintySet

prices = load_ftse100_dataset()

X = prices_to_returns(prices)
X_train, X_test = train_test_split(X, test_size=0.33, shuffle=False)

Efficient Frontier#

First, we create a Mean-CVaR model to estimate the efficient frontier without uncertainty set. We constrain the CVaR at 95% to be below 2% (representing the average loss of the worst 5% daily returns over the period):

model = MeanRisk(
    risk_measure=RiskMeasure.CVAR,
    min_weights=-1,
    max_cvar=0.02,
    efficient_frontier_size=20,
    portfolio_params=dict(name="Mean-CVaR", tag="No Uncertainty Set"),
)
model.fit(X_train)
model.weights_.shape
(20, 64)

Now, we create a robust (worst case) Mean-CVaR model with an uncertainty set on the expected returns:

model_uncertainty = MeanRisk(
    risk_measure=RiskMeasure.CVAR,
    min_weights=-1,
    max_cvar=0.02,
    efficient_frontier_size=20,
    mu_uncertainty_set_estimator=EmpiricalMuUncertaintySet(confidence_level=0.60),
    portfolio_params=dict(name="Mean-CVaR", tag="Mu Uncertainty Set - 60%"),
)
model_uncertainty.fit(X_train)
model_uncertainty.weights_.shape
(20, 64)

Let’s plot both efficient frontiers on the training set:

population_train = model.predict(X_train) + model_uncertainty.predict(X_train)

population_train.plot_measures(
    x=RiskMeasure.CVAR,
    y=PerfMeasure.ANNUALIZED_MEAN,
    color_scale=RatioMeasure.ANNUALIZED_SHARPE_RATIO,
    hover_measures=[RiskMeasure.MAX_DRAWDOWN, RatioMeasure.ANNUALIZED_SORTINO_RATIO],
)


Hyper-Parameter Tuning#

In this section, we consider a 3 months rolling (60 business days) long-short allocation fitted on the preceding year of data (252 business days) that maximizes the portfolio return under a CVaR constraint. We will use GridSearchCV to select the below model parameters on the training set using walk forward analysis with a Mean/CVaR ratio scoring.

The model parameters to tune are:

  • max_cvar: CVaR target (upper constraint)

  • cvar_beta: CVaR confidence level

  • confidence_level: Mu uncertainty set confidence level of the EmpiricalMuUncertaintySet

For embedded parameters in the GridSearchCV, you need to use a double underscore: mu_uncertainty_set_estimator__confidence_level

model_no_uncertainty = MeanRisk(
    risk_measure=RiskMeasure.CVAR,
    objective_function=ObjectiveFunction.MAXIMIZE_RETURN,
    max_cvar=0.02,
    cvar_beta=0.9,
    min_weights=-1,
)

model_uncertainty = clone(model_no_uncertainty)
model_uncertainty.set_params(mu_uncertainty_set_estimator=EmpiricalMuUncertaintySet())

cv = WalkForward(train_size=252, test_size=60)

grid_search = GridSearchCV(
    estimator=model_uncertainty,
    cv=cv,
    n_jobs=-1,
    param_grid={
        "mu_uncertainty_set_estimator__confidence_level": [0.80, 0.90],
        "max_cvar": [0.03, 0.04, 0.05],
        "cvar_beta": [0.8, 0.9, 0.95],
    },
    scoring=make_scorer(RatioMeasure.CVAR_RATIO),
)
grid_search.fit(X_train)
best_model = grid_search.best_estimator_
print(best_model)
MeanRisk(cvar_beta=0.9, max_cvar=0.03, min_weights=-1,
         mu_uncertainty_set_estimator=EmpiricalMuUncertaintySet(confidence_level=0.8),
         objective_function=MAXIMIZE_RETURN, risk_measure=CVaR)

The optimal parameters among the above 2x3x3 grid are the max_cvar=3%, cvar_beta=90% and EmpiricalMuUncertaintySet confidence_level=80%. These parameters are the ones that achieved the highest mean out-of-sample Mean/CVaR ratio.

For continuous parameters, such as confidence_level, a better approach is to use RandomizedSearchCV and specify a continuous distribution to take full advantage of the randomization. We specify a continuous random variable that is uniformly distributed between 0 and 1:

randomized_search = RandomizedSearchCV(
    estimator=model_uncertainty,
    cv=cv,
    n_jobs=-1,
    param_distributions={
        "mu_uncertainty_set_estimator__confidence_level": uniform(loc=0, scale=1),
    },
    n_iter=50,
    scoring=make_scorer(RatioMeasure.CVAR_RATIO),
)
randomized_search.fit(X_train)
best_model_rs = randomized_search.best_estimator_

The selected confidence level is 58%.

Let’s plot the average out-of-sample score (CVaR ratio) as a function of the uncertainty set confidence level:

cv_results = randomized_search.cv_results_
x = np.asarray(
    cv_results["param_mu_uncertainty_set_estimator__confidence_level"]
).astype(float)
sort_idx = np.argsort(x)
y_test_mean = cv_results["mean_test_score"][sort_idx]
x = x[sort_idx]

fig = go.Figure(
    [
        go.Scatter(
            x=x,
            y=y_test_mean,
            name="Test",
            mode="lines",
            line=dict(color="rgb(255,165,0)"),
        ),
    ]
)
fig.add_vline(
    x=randomized_search.best_params_["mu_uncertainty_set_estimator__confidence_level"],
    line_width=2,
    line_dash="dash",
    line_color="green",
)
fig.update_layout(
    title="Test score",
    xaxis_title="Uncertainty Set Confidence Level",
    yaxis_title="CVaR Ratio",
)
fig.update_yaxes(tickformat=".3f")
fig.update_xaxes(tickformat=".0%")
show(fig)

Now, we analyze all three models on the test set. By using cross_val_predict with WalkForward, we are able to compute efficiently the MultiPeriodPortfolio composed of 60 days rolling portfolios fitted on the preceding 252 days:

pred_no_uncertainty = cross_val_predict(model_no_uncertainty, X_test, cv=cv)
pred_no_uncertainty.name = "No Uncertainty set"

pred_uncertainty = cross_val_predict(best_model, X_test, cv=cv, n_jobs=-1)
pred_uncertainty.name = "Uncertainty set - Grid Search"

pred_uncertainty_rs = cross_val_predict(best_model_rs, X_test, cv=cv, n_jobs=-1)
pred_uncertainty_rs.name = "Uncertainty set - Randomized Search"

population = Population([pred_no_uncertainty, pred_uncertainty, pred_uncertainty_rs])
population.plot_cumulative_returns()


From the plot and the below summary, we can see that the model without uncertainty set is overfitted and perform poorly on the test set. Its CVaR at 95% is 10% and its Mean/CVaR ratio is 0.006 which is the lowest of all models.

population.summary()
No Uncertainty set Uncertainty set - Grid Search Uncertainty set - Randomized Search
Mean 0.073% 0.031% 0.025%
Annualized Mean 18.51% 7.83% 6.18%
Variance 0.21% 0.0099% 0.010%
Annualized Variance 53.03% 2.50% 2.59%
Semi-Variance 0.11% 0.0055% 0.0057%
Annualized Semi-Variance 27.82% 1.39% 1.45%
Standard Deviation 4.59% 1.00% 1.01%
Annualized Standard Deviation 72.82% 15.80% 16.09%
Semi-Deviation 3.32% 0.74% 0.76%
Annualized Semi-Deviation 52.74% 11.81% 12.03%
Mean Absolute Deviation 3.38% 0.69% 0.70%
CVaR at 95% 10.75% 2.45% 2.49%
EVaR at 95% 18.51% 4.85% 4.82%
Worst Realization 34.49% 9.22% 9.09%
CDaR at 95% 97.73% 18.91% 18.98%
MAX Drawdown 107.65% 39.30% 39.38%
Average Drawdown 46.42% 4.92% 5.65%
EDaR at 95% 99.75% 26.38% 26.53%
First Lower Partial Moment 1.69% 0.34% 0.35%
Ulcer Index 0.56 0.069 0.074
Gini Mean Difference 4.92% 1.01% 1.03%
Value at Risk at 95% 7.56% 1.50% 1.53%
Drawdown at Risk at 95% 91.94% 13.89% 14.08%
Entropic Risk Measure at 95% 3.00 3.00 3.00
Fourth Central Moment 0.0029% 0.000013% 0.000014%
Fourth Lower Partial Moment 0.0019% 0.000009% 0.000009%
Skew -32.20% -63.45% -62.25%
Kurtosis 655.94% 1343.28% 1283.05%
Sharpe Ratio 0.016 0.031 0.024
Annualized Sharpe Ratio 0.25 0.50 0.38
Sortino Ratio 0.022 0.042 0.032
Annualized Sortino Ratio 0.35 0.66 0.51
Mean Absolute Deviation Ratio 0.022 0.045 0.035
First Lower Partial Moment Ratio 0.044 0.090 0.070
Value at Risk Ratio at 95% 0.0097 0.021 0.016
CVaR Ratio at 95% 0.0068 0.013 0.0098
Entropic Risk Measure Ratio at 95% 0.00025 0.00010 0.000082
EVaR Ratio at 95% 0.0040 0.0064 0.0051
Worst Realization Ratio 0.0021 0.0034 0.0027
Drawdown at Risk Ratio at 95% 0.00080 0.0022 0.0017
CDaR Ratio at 95% 0.00075 0.0016 0.0013
Calmar Ratio 0.00068 0.00079 0.00062
Average Drawdown Ratio 0.0016 0.0063 0.0043
EDaR Ratio at 95% 0.00074 0.0012 0.00093
Ulcer Index Ratio 0.0013 0.0045 0.0033
Gini Mean Difference Ratio 0.015 0.031 0.024
Portfolios Number 28 28 28
Avg nb of Assets per Portfolio 64.0 64.0 64.0


Finally, let’s plot the composition of the regularized multi-period portfolio:

pred_uncertainty.plot_composition()


Total running time of the script: (3 minutes 9.733 seconds)

Gallery generated by Sphinx-Gallery