Note
Go to the end to download the full example code. or to run this example in your browser via JupyterLite or Binder
Factor Model#
This tutorial shows how to use the FactorModel
estimator in
the MeanRisk
optimization.
A prior estimator fits a PriorModel
containing
the distribution estimate of asset returns. It represents the investor’s prior beliefs
about the model used to estimate such distribution.
The PriorModel
is a dataclass containing:
mu
: Expected returns estimation
covariance
: Covariance matrix estimation
returns
: assets returns estimation
cholesky
: Lower-triangular Cholesky factor of the covariance estimation (optional)
The FactorModel
estimator estimates the PriorModel
using a factor model and a
prior estimator of the factor’s returns. The purpose of factor models is
to impose a structure on financial variables and their covariance matrix by explaining
them through a small number of common factors. This can help overcome estimation
error by reducing the number of parameters, i.e., the dimensionality of the estimation
problem, making portfolio optimization more robust against noise in the data. Factor
models also provide a decomposition of financial risk to systematic and security
specific components.
To be compatible with scikit-learn
, the fit
method takes X
as the assets returns
and y
as the factors returns. Note that y
is in lowercase even for a 2D array
(more than one factor). This is for consistency with the scikit-learn API.
In this tutorial we will build a Maximum Sharpe Ratio portfolio using the FactorModel
estimator.
Data#
We load the S&P 500 dataset composed of the daily prices of 20 assets from the SPX Index composition and the Factors dataset composed of the daily prices of 5 ETF representing common factors:
from plotly.io import show
from sklearn.linear_model import RidgeCV
from sklearn.model_selection import train_test_split
from skfolio import Population, RiskMeasure
from skfolio.datasets import load_factors_dataset, load_sp500_dataset
from skfolio.moments import GerberCovariance, ShrunkMu
from skfolio.optimization import MeanRisk, ObjectiveFunction
from skfolio.preprocessing import prices_to_returns
from skfolio.prior import EmpiricalPrior, FactorModel, LoadingMatrixRegression
prices = load_sp500_dataset()
factor_prices = load_factors_dataset()
prices = prices["2014":]
factor_prices = factor_prices["2014":]
X, y = prices_to_returns(prices, factor_prices)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, shuffle=False)
Factor Model#
We create a Maximum Sharpe Ratio model using the Factor Model that we fit on the training set:
model_factor_1 = MeanRisk(
risk_measure=RiskMeasure.VARIANCE,
objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
prior_estimator=FactorModel(),
portfolio_params=dict(name="Factor Model 1"),
)
model_factor_1.fit(X_train, y_train)
model_factor_1.weights_
array([1.03294289e-06, 1.27482685e-03, 4.19682805e-07, 3.34130827e-06,
7.36838289e-07, 1.28824408e-06, 5.13031432e-02, 6.35619183e-02,
6.14804836e-07, 1.79106051e-01, 5.03130911e-02, 7.13734379e-02,
4.13002526e-02, 2.27978407e-01, 5.13348034e-02, 1.44130375e-01,
2.99026117e-07, 6.19737850e-02, 5.63413085e-02, 8.67773200e-07])
We can change the BaseLoadingMatrix
that estimates the loading
matrix (betas) of the factors.
The default is the LoadingMatrixRegression
, which fit the factors using a
LassoCV
on each asset separately.
For example, let’s change the LassoCV
into a RidgeCV
without intercept and use
parallelization:
model_factor_2 = MeanRisk(
risk_measure=RiskMeasure.VARIANCE,
objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
prior_estimator=FactorModel(
loading_matrix_estimator=LoadingMatrixRegression(
linear_regressor=RidgeCV(fit_intercept=False), n_jobs=-1
)
),
portfolio_params=dict(name="Factor Model 2"),
)
model_factor_2.fit(X_train, y_train)
model_factor_2.weights_
array([3.97758339e-02, 6.57843874e-03, 2.18405141e-02, 8.98258882e-03,
3.16197378e-02, 1.42391168e-02, 8.00124906e-02, 8.32090802e-02,
4.74782930e-02, 8.59470407e-02, 4.59776221e-02, 5.91778878e-02,
8.42236770e-02, 1.05684777e-01, 6.43841778e-02, 7.94729901e-02,
3.76786711e-05, 5.23695742e-02, 4.35215146e-02, 4.54669667e-02])
We can also change the prior estimator of the factors.
It is used to estimate the PriorModel
containing the factors
expected returns and covariance matrix.
For example, let’s estimate the factors expected returns with James-Stein shrinkage and the factors covariance matrix with the Gerber covariance estimator:
model_factor_3 = MeanRisk(
risk_measure=RiskMeasure.VARIANCE,
objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
prior_estimator=FactorModel(
factor_prior_estimator=EmpiricalPrior(
mu_estimator=ShrunkMu(), covariance_estimator=GerberCovariance()
)
),
portfolio_params=dict(name="Factor Model 3"),
)
model_factor_3.fit(X_train, y_train)
model_factor_3.weights_
array([4.86490688e-07, 4.38230191e-07, 4.24408219e-08, 6.69653310e-08,
5.11878211e-08, 6.14581598e-08, 1.68436387e-02, 2.08439608e-06,
5.27854151e-08, 6.45513596e-02, 6.24004728e-02, 9.61498232e-02,
3.68209826e-01, 2.44692220e-01, 5.86512134e-07, 9.30385158e-06,
2.19939734e-08, 1.47139096e-01, 3.09340377e-07, 5.81316428e-08])
Factor Analysis#
Each fitted estimator is saved with a trailing underscore. For example, we can access the fitted prior estimator with:
prior_estimator = model_factor_3.prior_estimator_
We can access the prior model with:
prior_model = prior_estimator.prior_model_
We can access the loading matrix with:
loading_matrix = prior_estimator.loading_matrix_estimator_.loading_matrix_
Empirical Model#
For comparison, we also create a Maximum Sharpe Ratio model using the default Empirical estimator:
model_empirical = MeanRisk(
risk_measure=RiskMeasure.VARIANCE,
objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
portfolio_params=dict(name="Empirical"),
)
model_empirical.fit(X_train)
model_empirical.weights_
array([1.01561518e-01, 7.81165193e-02, 6.29030037e-07, 1.89005488e-02,
3.05610119e-07, 1.55770502e-07, 1.10594710e-01, 1.22328443e-06,
1.56471742e-06, 3.39453276e-06, 1.62631058e-01, 1.92171374e-06,
1.77783711e-01, 9.61805760e-02, 4.64493062e-07, 9.68566446e-03,
7.41771353e-08, 2.44533886e-01, 1.83984287e-06, 2.34178301e-07])
Prediction#
We predict all models on the test set:
ptf_factor_1_test = model_factor_1.predict(X_test)
ptf_factor_2_test = model_factor_2.predict(X_test)
ptf_factor_3_test = model_factor_3.predict(X_test)
ptf_empirical_test = model_empirical.predict(X_test)
population = Population(
[ptf_factor_1_test, ptf_factor_2_test, ptf_factor_3_test, ptf_empirical_test]
)
fig = population.plot_cumulative_returns()
show(fig)
Let’s plot the portfolios’ composition:
population.plot_composition()
Total running time of the script: (0 minutes 3.171 seconds)