Note

Go to the end to download the full example code or to run this example in your browser via JupyterLite or Binder.

Factor Model#

This tutorial shows how to use the FactorModel estimator in the MeanRisk optimization.

A Prior Estimator in skfolio fits a ReturnDistribution containing your pre-optimization inputs (\(\mu\), \(\Sigma\), returns, sample weight, Cholesky decomposition).

The term “prior” is used in a general optimization sense, not confined to Bayesian priors. It denotes any a priori assumption or estimation method for the return distribution before optimization, unifying both Frequentist, Bayesian and Information-theoretic approaches into a single cohesive framework:

Frequentist:
Bayesian:
- BlackLitterman
Information-theoretic:
- EntropyPooling
- OpinionPooling

In skfolio’s API, all such methods share the same interface and adhere to scikit-learn’s estimator API: the fit method accepts X (the asset returns) and stores the resulting ReturnDistribution in its return_distribution_ attribute.

The ReturnDistribution is a dataclass containing:

mu: Estimated expected returns of shape (n_assets,)

covariance: Estimated covariance matrix of shape (n_assets, n_assets)

returns: (Estimated) asset returns of shape (n_observations, n_assets)

sample_weight : Sample weight for each observation of shape (n_observations,) (optional)

cholesky : Lower-triangular Cholesky factor of the covariance (optional)

The FactorModel estimator estimates the ReturnDistribution by fitting a factor model on asset returns alongside a specified prior estimator for the factor returns.

The purpose of factor models is to impose a structure on financial variables and their covariance matrix by explaining them through a small number of common factors. This can help overcome estimation error by reducing the number of parameters, i.e., the dimensionality of the estimation problem, making portfolio optimization more robust against noise in the data. Factor models also provide a decomposition of financial risk into systematic and security-specific components.

To be fully compatible with scikit-learn, the fit method takes X as the assets returns and y as the factors returns. Note that y is in lowercase even for a 2D array (more than one factor). This is for consistency with the scikit-learn API.

In this tutorial we will build a Maximum Sharpe Ratio portfolio using the FactorModel estimator.

Data#

We load the S&P 500 dataset composed of the daily prices of 20 assets from the SPX Index composition and the Factors dataset composed of the daily prices of 5 ETF representing common factors:

from plotly.io import show
from sklearn.linear_model import RidgeCV
from sklearn.model_selection import train_test_split

from skfolio import Population, RiskMeasure
from skfolio.datasets import load_factors_dataset, load_sp500_dataset
from skfolio.moments import GerberCovariance, ShrunkMu
from skfolio.optimization import MeanRisk, ObjectiveFunction
from skfolio.preprocessing import prices_to_returns
from skfolio.prior import EmpiricalPrior, FactorModel, LoadingMatrixRegression

prices = load_sp500_dataset()
factor_prices = load_factors_dataset()

X, y = prices_to_returns(prices, factor_prices)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, shuffle=False)

Factor Model#

We create a Maximum Sharpe Ratio model using the Factor Model that we fit on the training set:

model_factor_1 = MeanRisk(
    risk_measure=RiskMeasure.VARIANCE,
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    prior_estimator=FactorModel(),
    portfolio_params=dict(name="Factor Model 1"),
)
model_factor_1.fit(X_train, y_train)
model_factor_1.weights_

array([1.03294289e-06, 1.27482685e-03, 4.19682805e-07, 3.34130826e-06,
       7.36838288e-07, 1.28824408e-06, 5.13031432e-02, 6.35619183e-02,
       6.14804835e-07, 1.79106051e-01, 5.03130911e-02, 7.13734379e-02,
       4.13002526e-02, 2.27978407e-01, 5.13348034e-02, 1.44130375e-01,
       2.99026117e-07, 6.19737850e-02, 5.63413085e-02, 8.67773199e-07])

We can change the BaseLoadingMatrix that estimates the loading matrix (betas) of the factors.

The default is the LoadingMatrixRegression, which fit the factors using a LassoCV on each asset separately.

For example, let’s change the LassoCV into a RidgeCV without intercept and use parallelization:

model_factor_2 = MeanRisk(
    risk_measure=RiskMeasure.VARIANCE,
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    prior_estimator=FactorModel(
        loading_matrix_estimator=LoadingMatrixRegression(
            linear_regressor=RidgeCV(fit_intercept=False), n_jobs=-1
        )
    ),
    portfolio_params=dict(name="Factor Model 2"),
)
model_factor_2.fit(X_train, y_train)
model_factor_2.weights_

array([3.97758339e-02, 6.57843874e-03, 2.18405141e-02, 8.98258882e-03,
       3.16197378e-02, 1.42391168e-02, 8.00124906e-02, 8.32090802e-02,
       4.74782930e-02, 8.59470407e-02, 4.59776221e-02, 5.91778878e-02,
       8.42236770e-02, 1.05684777e-01, 6.43841778e-02, 7.94729901e-02,
       3.76786713e-05, 5.23695742e-02, 4.35215146e-02, 4.54669667e-02])

We can also change the prior estimator of the factors. It is used to estimate the ReturnDistribution containing the factors expected returns and covariance matrix.

For example, let’s estimate the factors expected returns with James-Stein shrinkage and the factors covariance matrix with the Gerber covariance estimator:

model_factor_3 = MeanRisk(
    risk_measure=RiskMeasure.VARIANCE,
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    prior_estimator=FactorModel(
        factor_prior_estimator=EmpiricalPrior(
            mu_estimator=ShrunkMu(), covariance_estimator=GerberCovariance()
        )
    ),
    portfolio_params=dict(name="Factor Model 3"),
)
model_factor_3.fit(X_train, y_train)
model_factor_3.weights_

array([4.86490689e-07, 4.38230192e-07, 4.24408221e-08, 6.69653312e-08,
       5.11878213e-08, 6.14581600e-08, 1.68436387e-02, 2.08439609e-06,
       5.27854153e-08, 6.45513596e-02, 6.24004728e-02, 9.61498232e-02,
       3.68209826e-01, 2.44692220e-01, 5.86512136e-07, 9.30385161e-06,
       2.19939734e-08, 1.47139096e-01, 3.09340378e-07, 5.81316430e-08])

Factor Analysis#

Each fitted estimator is saved with a trailing underscore. For example, we can access the fitted prior estimator with:

prior_estimator = model_factor_3.prior_estimator_

We can access the return distribution with:

return_distribution = prior_estimator.return_distribution_

We can access the loading matrix with:

loading_matrix = prior_estimator.loading_matrix_estimator_.loading_matrix_

Empirical Model#

For comparison, we also create a Maximum Sharpe Ratio model using the default Empirical estimator:

model_empirical = MeanRisk(
    risk_measure=RiskMeasure.VARIANCE,
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    portfolio_params=dict(name="Empirical"),
)
model_empirical.fit(X_train)
model_empirical.weights_

array([1.01561518e-01, 7.81165193e-02, 6.29030035e-07, 1.89005488e-02,
       3.05610118e-07, 1.55770502e-07, 1.10594710e-01, 1.22328443e-06,
       1.56471742e-06, 3.39453275e-06, 1.62631058e-01, 1.92171373e-06,
       1.77783711e-01, 9.61805760e-02, 4.64493061e-07, 9.68566446e-03,
       7.41771350e-08, 2.44533886e-01, 1.83984286e-06, 2.34178300e-07])

Prediction#

We predict all models on the test set:

ptf_factor_1_test = model_factor_1.predict(X_test)
ptf_factor_2_test = model_factor_2.predict(X_test)
ptf_factor_3_test = model_factor_3.predict(X_test)
ptf_empirical_test = model_empirical.predict(X_test)

population = Population(
    [ptf_factor_1_test, ptf_factor_2_test, ptf_factor_3_test, ptf_empirical_test]
)

fig = population.plot_cumulative_returns()
show(fig)

Let’s plot the portfolios’ composition:

population.plot_composition()

Total running time of the script: (0 minutes 4.604 seconds)

Gallery generated by Sphinx-Gallery