L1 and L2 Regularization#

This tutorial shows how to incorporate regularization into the MeanRisk optimization.

Regularization tends to increase robustness and out-of-sample stability.

The l1_coef parameter is used to penalize the objective function by the L1 norm:

\[l1\_coef \times \Vert w \Vert_{1} = l1\_coef \times \sum_{i=1}^{N} |w_{i}|\]

and the l2_coef parameter is used to penalize the objective function by the L2 norm:

\[l2\_coef \times \Vert w \Vert_{2}^{2} = l2\_coef \times \sum_{i=1}^{N} w_{i}^2\]

Warning

Increasing the L1 coefficient may reduce the number of non-zero weights (cardinality), which can reduce diversification. However, a reduction in diversification does not necessarily equate to a reduction in robustness.

Note

Increasing the L1 coefficient has no impact if the portfolio is long only.

In this example we will use a dataset with a large number of assets and long-short allocation to exacerbate overfitting.

First, we will analyze the impact of regularization on the entire Mean-Variance efficient frontier and its stability from the training set to the test set. Then, we will show how to tune the regularization coefficients using cross-validation with GridSearchCV.

Data#

We load the FTSE 100 dataset composed of the daily prices of 64 assets from the FTSE 100 Index composition starting from 2000-01-04 up to 2023-05-31.

import numpy as np
import plotly.graph_objects as go
from plotly.io import show
from scipy.stats import loguniform
from sklearn import clone
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, train_test_split

from skfolio import PerfMeasure, Population, RatioMeasure, RiskMeasure
from skfolio.datasets import load_ftse100_dataset
from skfolio.metrics import make_scorer
from skfolio.model_selection import WalkForward, cross_val_predict
from skfolio.optimization import EqualWeighted, MeanRisk, ObjectiveFunction
from skfolio.preprocessing import prices_to_returns

prices = load_ftse100_dataset()
X = prices_to_returns(prices)

X_train, X_test = train_test_split(X, test_size=0.33, shuffle=False)

Efficient Frontier#

First, we create a Mean-Variance model to estimate the efficient frontier without regularization. We constrain the volatility to be below 30% p.a.

model = MeanRisk(
    risk_measure=RiskMeasure.VARIANCE,
    min_weights=-1,
    max_variance=0.3**2 / 252,
    efficient_frontier_size=30,
    portfolio_params=dict(name="Mean-Variance", tag="No Regularization"),
)
model.fit(X_train)
model.weights_.shape
(30, 64)

Now we create the two regularized models:

model_l1 = MeanRisk(
    risk_measure=RiskMeasure.VARIANCE,
    min_weights=-1,
    max_variance=0.3**2 / 252,
    efficient_frontier_size=30,
    l1_coef=0.001,
    portfolio_params=dict(name="Mean-Variance", tag="L1 Regularization"),
)
model_l1.fit(X_train)

model_l2 = clone(model_l1)
model_l2.set_params(
    l1_coef=0,
    l2_coef=0.001,
    portfolio_params=dict(name="Mean-Variance", tag="L2 Regularization"),
)
model_l2.fit(X_train)
model_l2.weights_.shape
(30, 64)

Let’s plot the efficient frontiers on the training set:

population_train = (
    model.predict(X_train) + model_l1.predict(X_train) + model_l2.predict(X_train)
)

population_train.plot_measures(
    x=RiskMeasure.ANNUALIZED_STANDARD_DEVIATION,
    y=PerfMeasure.ANNUALIZED_MEAN,
    color_scale=RatioMeasure.ANNUALIZED_SHARPE_RATIO,
    hover_measures=[RiskMeasure.MAX_DRAWDOWN, RatioMeasure.ANNUALIZED_SORTINO_RATIO],
)


Prediction#

The parameter efficient_frontier_size=30 means that when we called the fit method, each model ran 30 optimizations along the efficient frontier. Therefore, the predict method will return a Population composed of 30 Portfolio:

population_test = (
    model.predict(X_test) + model_l1.predict(X_test) + model_l2.predict(X_test)
)

for tag in ["No Regularization", "L1 Regularization"]:

    print("=================")
    print(tag)
    print("=================")
    print(
        "Avg Sharpe Ratio Train:"
        f" {population_train.filter(tags=tag).measures_mean(measure=RatioMeasure.ANNUALIZED_SHARPE_RATIO):0.2f}"
    )
    print(
        "Avg Sharpe Ratio Test:"
        f" {population_test.filter(tags=tag).measures_mean(measure=RatioMeasure.ANNUALIZED_SHARPE_RATIO):0.2f}"
    )
    print(
        "Avg non-zeros assets:"
        f" {np.mean([len(ptf.nonzero_assets) for ptf in population_train.filter(tags=tag)]):0.2f}"
    )
    print("\n")

population_test.plot_measures(
    x=RiskMeasure.ANNUALIZED_STANDARD_DEVIATION,
    y=PerfMeasure.ANNUALIZED_MEAN,
    color_scale=RatioMeasure.ANNUALIZED_SHARPE_RATIO,
    hover_measures=[RiskMeasure.MAX_DRAWDOWN, RatioMeasure.ANNUALIZED_SORTINO_RATIO],
)
=================
No Regularization
=================
Avg Sharpe Ratio Train: 1.93
Avg Sharpe Ratio Test: 0.43
Avg non-zeros assets: 64.00


=================
L1 Regularization
=================
Avg Sharpe Ratio Train: 1.37
Avg Sharpe Ratio Test: 0.73
Avg non-zeros assets: 13.57


In this example we can clearly see that L1 regularization reduced the number of assets (from 64 down to 14) and made the model more robust: the portfolios without regularization have a higher Sharpe on the train set and a lower Sharpe on the test set compared to the portfolios with regularization.

Hyper-parameter Tuning#

In this section, we consider a 3 months rolling (60 business days) long-short allocation fitted on the preceding year of data (252 business days) that maximizes the return under a volatility constraint of 30% p.a.

We use GridSearchCV to select the optimal L1 and L2 regularization coefficients on the training set using cross-validation that achieve the highest mean test score. We use the default score, which is the Sharpe ratio. Finally, we evaluate the model on the test set and compare it with the equal-weighted benchmark and a reference model without regularization:

ref_model = MeanRisk(
    risk_measure=RiskMeasure.VARIANCE,
    objective_function=ObjectiveFunction.MAXIMIZE_RETURN,
    max_variance=0.3**2 / 252,
    min_weights=-1,
)

cv = WalkForward(train_size=252, test_size=60)

grid_search = GridSearchCV(
    estimator=ref_model,
    cv=cv,
    n_jobs=-1,
    param_grid={
        "l1_coef": [0.001, 0.01, 0.1],
        "l2_coef": [0.001, 0.01, 0.1],
    },
)
grid_search.fit(X_train)
best_model = grid_search.best_estimator_
print(best_model)
MeanRisk(l1_coef=0.1, l2_coef=0.01, max_variance=0.00035714285714285714,
         min_weights=-1, objective_function=MAXIMIZE_RETURN)

The optimal parameters among the above 3x3 grid are 0.01 for the L1 coefficient and the L2 coefficient. These parameters are the ones that achieved the highest mean out-of-sample Sharpe Ratio. Note that the score can be changed to another measure or function using the scoring parameter.

For continuous parameters, such as L1 and L2 above, a better approach is to use RandomizedSearchCV and specify a continuous distribution to take full advantage of the randomization.

A continuous log-uniform random variable is the continuous version of a log-spaced parameter. For example, to specify the equivalent of the L1 parameter from above, loguniform(1e-3, 1e-1) can be used instead of [0.001, 0.01, 0.1].

Mirroring the example above in grid search, we can specify a continuous random variable that is log-uniformly distributed between 1e-3 and 1e-1:

randomized_search = RandomizedSearchCV(
    estimator=ref_model,
    cv=cv,
    n_jobs=-1,
    param_distributions={
        "l2_coef": loguniform(1e-3, 1e-1),
    },
    n_iter=100,
    return_train_score=True,
    scoring=make_scorer(RatioMeasure.ANNUALIZED_SHARPE_RATIO),
)
randomized_search.fit(X_train)
best_model_rd = randomized_search.best_estimator_
print(best_model_rd)
MeanRisk(l2_coef=np.float64(0.022885763179068888),
         max_variance=0.00035714285714285714, min_weights=-1,
         objective_function=MAXIMIZE_RETURN)

Let’s plot both the average in-sample and out-of-sample scores (annualized Sharpe ratio) as a function of l2_coef:

cv_results = randomized_search.cv_results_
x = np.asarray(cv_results["param_l2_coef"]).astype(float)
sort_idx = np.argsort(x)
y_train_mean = cv_results["mean_train_score"][sort_idx]
y_train_std = cv_results["std_train_score"][sort_idx]
y_test_mean = cv_results["mean_test_score"][sort_idx]
y_test_std = cv_results["std_test_score"][sort_idx]
x = x[sort_idx]

fig = go.Figure(
    [
        go.Scatter(
            x=x,
            y=y_train_mean,
            name="Train",
            mode="lines",
            line=dict(color="rgb(31, 119, 180)"),
        ),
        go.Scatter(
            x=x,
            y=y_train_mean + y_train_std,
            mode="lines",
            line=dict(width=0),
            showlegend=False,
        ),
        go.Scatter(
            x=x,
            y=y_train_mean - y_train_std,
            mode="lines",
            line=dict(width=0),
            showlegend=False,
            fillcolor="rgba(31, 119, 180,0.15)",
            fill="tonexty",
        ),
        go.Scatter(
            x=x,
            y=y_test_mean,
            name="Test",
            mode="lines",
            line=dict(color="rgb(255,165,0)"),
        ),
        go.Scatter(
            x=x,
            y=y_test_mean + y_test_std,
            mode="lines",
            line=dict(width=0),
            showlegend=False,
        ),
        go.Scatter(
            x=x,
            y=y_test_mean - y_test_std,
            line=dict(width=0),
            mode="lines",
            fillcolor="rgba(255,165,0, 0.15)",
            fill="tonexty",
            showlegend=False,
        ),
    ]
)
fig.add_vline(
    x=randomized_search.best_params_["l2_coef"],
    line_width=2,
    line_dash="dash",
    line_color="green",
)
fig.update_layout(
    title="Train/Test score",
    xaxis_title="L2 Coef",
    yaxis_title="Annualized Sharpe Ratio",
)
fig.update_yaxes(tickformat=".2f")
show(fig)

The highest mean out-of-sample Sharpe Ratio is 1.55 and is achieved for a L2 coef of 0.023. Also note that without regularization, the mean train Sharpe Ratio is around six time higher than the mean test Sharpe Ratio. That would be a clear indiction of overfitting.

Now, we analyze all three models on the test set. By using cross_val_predict with WalkForward, we are able to compute efficiently the MultiPeriodPortfolio composed of 60 days rolling portfolios fitted on the preceding 252 days:

benchmark = EqualWeighted()
pred_bench = cross_val_predict(benchmark, X_test, cv=cv)
pred_bench.name = "Benchmark"

pred_no_reg = cross_val_predict(ref_model, X_test, cv=cv)
pred_no_reg.name = "No Regularization"

pred_reg = cross_val_predict(best_model, X_test, cv=cv, n_jobs=-1)
pred_reg.name = "Regularization"

population = Population([pred_no_reg, pred_reg, pred_bench])
population.plot_cumulative_returns()


From the plot and the below summary, we can see that the un-regularized model is overfitted and perform poorly on the test set. Its annualized volatility is 54%, which is significantly above the model upper-bound of 30% and its Sharpe Ratio is 0.32 which is the lowest of all models.

population.summary()
No Regularization Regularization Benchmark
Mean 0.068% 0.040% 0.040%
Annualized Mean 17.10% 9.96% 9.97%
Variance 0.12% 0.013% 0.012%
Annualized Variance 29.25% 3.29% 2.95%
Semi-Variance 0.063% 0.0072% 0.0063%
Annualized Semi-Variance 15.75% 1.81% 1.59%
Standard Deviation 3.41% 1.14% 1.08%
Annualized Standard Deviation 54.08% 18.14% 17.18%
Semi-Deviation 2.50% 0.85% 0.79%
Annualized Semi-Deviation 39.69% 13.45% 12.59%
Mean Absolute Deviation 2.48% 0.79% 0.72%
CVaR at 95% 7.97% 2.78% 2.62%
EVaR at 95% 15.46% 5.21% 5.57%
Worst Realization 29.19% 9.72% 10.72%
CDaR at 95% 84.11% 21.84% 22.71%
MAX Drawdown 95.45% 46.08% 42.12%
Average Drawdown 31.81% 5.10% 4.50%
EDaR at 95% 86.40% 31.35% 29.55%
First Lower Partial Moment 1.24% 0.39% 0.36%
Ulcer Index 0.39 0.075 0.073
Gini Mean Difference 3.62% 1.16% 1.08%
Value at Risk at 95% 5.36% 1.65% 1.51%
Drawdown at Risk at 95% 77.78% 14.36% 14.87%
Entropic Risk Measure at 95% 3.00 3.00 3.00
Fourth Central Moment 0.0012% 0.000021% 0.000020%
Fourth Lower Partial Moment 0.00084% 0.000013% 0.000013%
Skew -53.64% -55.34% -53.39%
Kurtosis 860.55% 1225.28% 1486.48%
Sharpe Ratio 0.020 0.035 0.037
Annualized Sharpe Ratio 0.32 0.55 0.58
Sortino Ratio 0.027 0.047 0.050
Annualized Sortino Ratio 0.43 0.74 0.79
Mean Absolute Deviation Ratio 0.027 0.050 0.055
First Lower Partial Moment Ratio 0.055 0.10 0.11
Value at Risk Ratio at 95% 0.013 0.024 0.026
CVaR Ratio at 95% 0.0085 0.014 0.015
Entropic Risk Measure Ratio at 95% 0.00023 0.00013 0.00013
EVaR Ratio at 95% 0.0044 0.0076 0.0071
Worst Realization Ratio 0.0023 0.0041 0.0037
Drawdown at Risk Ratio at 95% 0.00087 0.0028 0.0027
CDaR Ratio at 95% 0.00081 0.0018 0.0017
Calmar Ratio 0.00071 0.00086 0.00094
Average Drawdown Ratio 0.0021 0.0078 0.0088
EDaR Ratio at 95% 0.00079 0.0013 0.0013
Ulcer Index Ratio 0.0017 0.0053 0.0054
Gini Mean Difference Ratio 0.019 0.034 0.037
Portfolios Number 28 28 28
Avg nb of Assets per Portfolio 64.0 64.0 64.0


Finally, we plot the composition of the regularized multi-period portfolio:

pred_reg.plot_composition()


Total running time of the script: (1 minutes 12.735 seconds)

Gallery generated by Sphinx-Gallery