Note

Go to the end to download the full example code. or to run this example in your browser via JupyterLite or Binder

Minimize CVaR on Stressed Factors#

This tutorial shows how to bridge scenario generation, factor models and portfolio optimization.

In the previous tutorial, we demonstrated how to generate conditional (stressed) synthetic returns using the VineCopula estimator.

Using the MeanRisk optimization, you could directly minimize the CVaR of your portfolio based on synthetic returns sampled from a given model (Vine Copula, GAN, VAE, etc.). However, in practice, we often need to perform cross-validation, portfolio rebalancing, and hyperparameter tuning. To facilitate this, we require a unified model that integrates synthetic data generation and optimization. This is exactly the role of the SyntheticData estimator, which bridges scenario generation, factor models and portfolio optimization.

There are several reasons why you might choose to run optimization on (factor) synthetic data rather than (factor) historical data:

Historical data is often limited, especially in the tails, which can make it challenging to model extreme events accurately. Using parametric copulas to explicitly capture tail dependencies allows for better extrapolation of joint extreme events. By generating a larger sample of returns from Vine Copulas, you improve the accuracy of capturing tail co-dependencies during the optimization process.
Build portfolios optimized for specific stressed scenarios.

Data#

We load the S&P 500 dataset composed of the daily prices of 20 assets from the SPX Index composition and the Factors dataset composed of the daily prices of 5 ETFs representing common factors.

from plotly.io import show
from sklearn.model_selection import train_test_split

from skfolio import Population, RiskMeasure
from skfolio.datasets import load_factors_dataset, load_sp500_dataset
from skfolio.distribution import VineCopula
from skfolio.model_selection import WalkForward, cross_val_predict
from skfolio.optimization import MeanRisk
from skfolio.preprocessing import prices_to_returns
from skfolio.prior import FactorModel, SyntheticData

prices = load_sp500_dataset()
factor_prices = load_factors_dataset()

X, factors = prices_to_returns(prices, factor_prices)
X_train, X_test, factors_train, factors_test = train_test_split(
    X, factors, test_size=0.33, shuffle=False
)
print(factors_train.tail())

                MTUM      QUAL      SIZE      USMV      VLUE
Date
2020-01-06  0.001112  0.002372  0.001443  0.001373 -0.001556
2020-01-07 -0.002463 -0.001380 -0.000520 -0.004099  0.001447
2020-01-08  0.005328  0.004043  0.003295  0.002123  0.002248
2020-01-09  0.008698  0.008157  0.004602  0.006257  0.001787
2020-01-10  0.000000 -0.001366 -0.003462 -0.000450 -0.004245

print("Shapes:")
print(f"X_train: {X_train.shape}")
print(f"X_test: {X_test.shape}")
print(f"factors_train: {factors_train.shape}")
print(f"factors_test: {factors_test.shape}")

Shapes:
X_train: (1516, 20)
X_test: (747, 20)
factors_train: (1516, 5)
factors_test: (747, 5)

Minimize CVaR on Synthetic Data#

Let’s find the minimum CVaR portfolio on 10,000 synthetic retruns generated from Vine Copula fitted on the historical training set and evaluate it on the historical test set.

vine = VineCopula(log_transform=True, n_jobs=-1, random_state=0)
prior = SyntheticData(distribution_estimator=vine, n_samples=10_000)
model = MeanRisk(risk_measure=RiskMeasure.CVAR, prior_estimator=prior)

model.fit(X_train)
print(model.weights_)
ptf = model.predict(X_test)
# You can then perform a full analysis using the portfolio methods.

[2.99162826e-02 1.93701362e-13 1.90901214e-12 3.39682864e-12
05026620e-12 1.34261010e-11 1.17286213e-01 4.80394596e-02
74864160e-12 1.68249868e-01 1.28994920e-02 1.72990690e-12
55425033e-02 1.12472101e-01 7.60281813e-03 1.21264749e-01
74366076e-13 4.41866498e-12 1.77207727e-01 1.79518784e-01]

Multi-period Portfolio#

Now let’s run a walk-forward analysis where we optimize the minimum CVaR portfolio on synthetic data generated from a Vine Copula fitted on one year (252 business days) of historical data and evaluate it on the following 3 months (60 business days) of data, repeating over the full history.

cv = WalkForward(train_size=252, test_size=60)
ptf = cross_val_predict(model, X_train, cv=cv)
ptf.summary()

Mean                                     0.041%
Annualized Mean                          10.24%
Variance                                0.0057%
Annualized Variance                       1.45%
Semi-Variance                           0.0033%
Annualized Semi-Variance                  0.82%
Standard Deviation                        0.76%
Annualized Standard Deviation            12.02%
Semi-Deviation                            0.57%
Annualized Semi-Deviation                 9.05%
Mean Absolute Deviation                   0.54%
CVaR at 95%                               1.91%
EVaR at 95%                               3.02%
Worst Realization                         4.78%
CDaR at 95%                              12.16%
MAX Drawdown                             16.12%
Average Drawdown                          3.48%
EDaR at 95%                              12.99%
First Lower Partial Moment                0.27%
Ulcer Index                               0.049
Gini Mean Difference                      0.79%
Value at Risk at 95%                      1.14%
Drawdown at Risk at 95%                  10.91%
Entropic Risk Measure at 95%               3.00
Fourth Central Moment                 0.000003%
Fourth Lower Partial Moment           0.000002%
Skew                                    -74.18%
Kurtosis                                836.07%
Sharpe Ratio                              0.054
Annualized Sharpe Ratio                    0.85
Sortino Ratio                             0.071
Annualized Sortino Ratio                   1.13
Mean Absolute Deviation Ratio             0.076
First Lower Partial Moment Ratio           0.15
Value at Risk Ratio at 95%                0.036
CVaR Ratio at 95%                         0.021
Entropic Risk Measure Ratio at 95%      0.00014
EVaR Ratio at 95%                         0.013
Worst Realization Ratio                  0.0085
Drawdown at Risk Ratio at 95%            0.0037
CDaR Ratio at 95%                        0.0033
Calmar Ratio                             0.0025
Average Drawdown Ratio                    0.012
EDaR Ratio at 95%                        0.0031
Ulcer Index Ratio                        0.0083
Gini Mean Difference Ratio                0.051
Portfolios Number                            21
Avg nb of Assets per Portfolio             20.0
dtype: object

Combining Synthetic Data with Factor Model#

Now, let’s add another layer of complexity by incorporating a Factor Model while stressing the quality factor (QUAL) by -20%. The model fits a Factor Model on historical data, then fits a Vine Copula on the factor data, samples 10,000 stressed scenarios from the Vine, and finally projects these scenarios back to the asset universe using the Factor Model.

vine = VineCopula(
    log_transform=True, central_assets=["QUAL"], n_jobs=-1, random_state=0
)
factor_prior = SyntheticData(
    distribution_estimator=vine,
    n_samples=10_000,
    sample_args=dict(conditioning={"QUAL": -0.2}),
)
factor_model = FactorModel(factor_prior_estimator=factor_prior)

model = MeanRisk(risk_measure=RiskMeasure.CVAR, prior_estimator=factor_model)
model.fit(X_train, factors_train)
print(model.weights_)

ptf = model.predict(X_test)

[3.05997008e-12 4.83660000e-14 5.98192711e-13 2.33257593e-13
33519769e-13 2.78595612e-13 1.91043820e-13 2.12302096e-13
04948749e-13 1.80469336e-12 1.97777150e-13 2.26394445e-13
23808306e-14 2.02916579e-13 1.02234151e-11 2.88928444e-13
48854762e-02 9.15855977e-13 9.85114524e-01 1.21588149e-13]

Let’s show how to drill down into the model to retrieve the fitted Vine Copula and plot the marginal distributions of the stressed factors alongside the historical data. The stressed Momentum (MTUM), Size (SIZE), Low Volatility (USMV), and Value (VLUE) factors deviate significantly from their unstressed distributions, reflecting the impact of stressing the Quality (QUAL) factor. Note that the stressed distribution of the Quality factor is a Dirac, since only -20% was sampled.

fitted_vine = model.prior_estimator_.factor_prior_estimator_.distribution_estimator_
fig = fitted_vine.plot_marginal_distributions(factors, conditioning={"QUAL": -0.2})
show(fig)

Factor Stress Test#

Finally, let’s stress-test the portfolio by further stressing the quality factor by -50%.

factor_model.set_params(
    factor_prior_estimator__sample_args=dict(conditioning={"QUAL": -0.5})
)
# Refit the factor model on the full dataset to update the stressed scenarios
factor_model.fit(X, factors)
stressed_X = factor_model.prior_model_.returns

stressed_ptf = model.predict(stressed_X)

ptf.name = "Unstressed Ptf"
stressed_ptf.name = "Stressed Ptf"
population = Population([ptf, stressed_ptf])
summary = population.summary()
summary.loc[
    ["Mean", "Standard Deviation", "CVaR at 95%", "EVaR at 95%", "Worst Realization"]
]

	Unstressed Ptf	Stressed Ptf
Mean	0.050%	-26.31%
Standard Deviation	1.62%	4.96%
CVaR at 95%	3.59%	33.07%
EVaR at 95%	6.77%	35.76%
Worst Realization	11.13%	43.53%

population.plot_returns_distribution()

Conclusion#

In this tutorial, we demonstrated how to bridge scenario generation, factor models, and portfolio optimization.

Total running time of the script: (1 minutes 24.921 seconds)

Gallery generated by Sphinx-Gallery