Note

Go to the end to download the full example code. or to run this example in your browser via JupyterLite or Binder

Factor Model#

This tutorial shows how to use the FactorModel estimator in the MeanRisk optimization.

A prior estimator fits a PriorModel containing the distribution estimate of asset returns. It represents the investor’s prior beliefs about the model used to estimate such distribution.

The PriorModel is a dataclass containing:

mu: Expected returns estimation

covariance: Covariance matrix estimation

returns: assets returns estimation

cholesky : Lower-triangular Cholesky factor of the covariance estimation (optional)

The FactorModel estimator estimates the PriorModel using a factor model and a prior estimator of the factor’s returns. The purpose of factor models is to impose a structure on financial variables and their covariance matrix by explaining them through a small number of common factors. This can help overcome estimation error by reducing the number of parameters, i.e., the dimensionality of the estimation problem, making portfolio optimization more robust against noise in the data. Factor models also provide a decomposition of financial risk to systematic and security specific components.

To be compatible with scikit-learn, the fit method takes X as the assets returns and y as the factors returns. Note that y is in lowercase even for a 2D array (more than one factor). This is for consistency with the scikit-learn API.

In this tutorial we will build a Maximum Sharpe Ratio portfolio using the FactorModel estimator.

Data#

We load the S&P 500 dataset composed of the daily prices of 20 assets from the SPX Index composition and the Factors dataset composed of the daily prices of 5 ETF representing common factors:

from plotly.io import show
from sklearn.linear_model import RidgeCV
from sklearn.model_selection import train_test_split

from skfolio import Population, RiskMeasure
from skfolio.datasets import load_factors_dataset, load_sp500_dataset
from skfolio.moments import GerberCovariance, ShrunkMu
from skfolio.optimization import MeanRisk, ObjectiveFunction
from skfolio.preprocessing import prices_to_returns
from skfolio.prior import EmpiricalPrior, FactorModel, LoadingMatrixRegression

prices = load_sp500_dataset()
factor_prices = load_factors_dataset()

X, y = prices_to_returns(prices, factor_prices)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, shuffle=False)

Factor Model#

We create a Maximum Sharpe Ratio model using the Factor Model that we fit on the training set:

model_factor_1 = MeanRisk(
    risk_measure=RiskMeasure.VARIANCE,
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    prior_estimator=FactorModel(),
    portfolio_params=dict(name="Factor Model 1"),
)
model_factor_1.fit(X_train, y_train)
model_factor_1.weights_

array([1.03294288e-06, 1.27482685e-03, 4.19682801e-07, 3.34130826e-06,
       7.36838280e-07, 1.28824407e-06, 5.13031432e-02, 6.35619183e-02,
       6.14804829e-07, 1.79106051e-01, 5.03130911e-02, 7.13734379e-02,
       4.13002526e-02, 2.27978407e-01, 5.13348034e-02, 1.44130375e-01,
       2.99026115e-07, 6.19737850e-02, 5.63413085e-02, 8.67773189e-07])

We can change the BaseLoadingMatrix that estimates the loading matrix (betas) of the factors.

The default is the LoadingMatrixRegression, which fit the factors using a LassoCV on each asset separately.

For example, let’s change the LassoCV into a RidgeCV without intercept and use parallelization:

model_factor_2 = MeanRisk(
    risk_measure=RiskMeasure.VARIANCE,
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    prior_estimator=FactorModel(
        loading_matrix_estimator=LoadingMatrixRegression(
            linear_regressor=RidgeCV(fit_intercept=False), n_jobs=-1
        )
    ),
    portfolio_params=dict(name="Factor Model 2"),
)
model_factor_2.fit(X_train, y_train)
model_factor_2.weights_

array([3.97758339e-02, 6.57843874e-03, 2.18405141e-02, 8.98258882e-03,
       3.16197378e-02, 1.42391168e-02, 8.00124906e-02, 8.32090802e-02,
       4.74782930e-02, 8.59470407e-02, 4.59776221e-02, 5.91778878e-02,
       8.42236770e-02, 1.05684777e-01, 6.43841778e-02, 7.94729901e-02,
       3.76786713e-05, 5.23695742e-02, 4.35215146e-02, 4.54669667e-02])

We can also change the prior estimator of the factors. It is used to estimate the PriorModel containing the factors expected returns and covariance matrix.

For example, let’s estimate the factors expected returns with James-Stein shrinkage and the factors covariance matrix with the Gerber covariance estimator:

model_factor_3 = MeanRisk(
    risk_measure=RiskMeasure.VARIANCE,
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    prior_estimator=FactorModel(
        factor_prior_estimator=EmpiricalPrior(
            mu_estimator=ShrunkMu(), covariance_estimator=GerberCovariance()
        )
    ),
    portfolio_params=dict(name="Factor Model 3"),
)
model_factor_3.fit(X_train, y_train)
model_factor_3.weights_

array([4.86490693e-07, 4.38230195e-07, 4.24408224e-08, 6.69653317e-08,
       5.11878216e-08, 6.14581604e-08, 1.68436387e-02, 2.08439610e-06,
       5.27854157e-08, 6.45513596e-02, 6.24004728e-02, 9.61498232e-02,
       3.68209826e-01, 2.44692220e-01, 5.86512140e-07, 9.30385171e-06,
       2.19939736e-08, 1.47139096e-01, 3.09340380e-07, 5.81316434e-08])

Factor Analysis#

Each fitted estimator is saved with a trailing underscore. For example, we can access the fitted prior estimator with:

prior_estimator = model_factor_3.prior_estimator_

We can access the prior model with:

prior_model = prior_estimator.prior_model_

We can access the loading matrix with:

loading_matrix = prior_estimator.loading_matrix_estimator_.loading_matrix_

Empirical Model#

For comparison, we also create a Maximum Sharpe Ratio model using the default Empirical estimator:

model_empirical = MeanRisk(
    risk_measure=RiskMeasure.VARIANCE,
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    portfolio_params=dict(name="Empirical"),
)
model_empirical.fit(X_train)
model_empirical.weights_

array([1.01561518e-01, 7.81165193e-02, 6.29030036e-07, 1.89005488e-02,
       3.05610119e-07, 1.55770502e-07, 1.10594710e-01, 1.22328443e-06,
       1.56471742e-06, 3.39453276e-06, 1.62631058e-01, 1.92171374e-06,
       1.77783711e-01, 9.61805760e-02, 4.64493062e-07, 9.68566446e-03,
       7.41771352e-08, 2.44533886e-01, 1.83984287e-06, 2.34178301e-07])

Prediction#

We predict all models on the test set:

ptf_factor_1_test = model_factor_1.predict(X_test)
ptf_factor_2_test = model_factor_2.predict(X_test)
ptf_factor_3_test = model_factor_3.predict(X_test)
ptf_empirical_test = model_empirical.predict(X_test)

population = Population(
    [ptf_factor_1_test, ptf_factor_2_test, ptf_factor_3_test, ptf_empirical_test]
)

fig = population.plot_cumulative_returns()
show(fig)

Let’s plot the portfolios’ composition:

population.plot_composition()

Total running time of the script: (0 minutes 3.077 seconds)

Gallery generated by Sphinx-Gallery