Note

Go to the end to download the full example code. or to run this example in your browser via JupyterLite or Binder

Empirical Prior#

This tutorial shows how to use the EmpiricalPrior estimator in the MeanRisk optimization.

A Prior Estimator in skfolio fits a ReturnDistribution containing your pre-optimization inputs (\(\mu\), \(\Sigma\), returns, sample weight, Cholesky decomposition).

The term “prior” is used in a general optimization sense, not confined to Bayesian priors. It denotes any a priori assumption or estimation method for the return distribution before optimization, unifying both Frequentist, Bayesian and Information-theoretic approaches into a single cohesive framework:

Frequentist:
Bayesian:
- BlackLitterman
Information-theoretic:
- EntropyPooling
- OpinionPooling

In skfolio’s API, all such methods share the same interface and adhere to scikit-learn’s estimator API: the fit method accepts X (the asset returns) and stores the resulting ReturnDistribution in its return_distribution_ attribute.

The ReturnDistribution is a dataclass containing:

mu: Estimated expected returns of shape (n_assets,)

covariance: Estimated covariance matrix of shape (n_assets, n_assets)

returns: (Estimated) asset returns of shape (n_observations, n_assets)

sample_weight : Sample weight for each observation of shape (n_observations,) (optional)

cholesky : Lower-triangular Cholesky factor of the covariance (optional)

The EmpiricalPrior estimator estimates the ReturnDistribution by fitting its mu_estimator and covariance_estimator independently.

In this tutorial we will build a Maximum Sharpe Ratio portfolio using the EmpiricalPrior estimator with James-Stein shrinkage for the estimation of expected returns and Denoising for the estimation of the covariance matrix.

Data#

We load the S&P 500 dataset composed of the daily prices of 20 assets from the SPX Index composition starting from 1990-01-02 up to 2022-12-28:

from plotly.io import show
from sklearn.model_selection import train_test_split

from skfolio import Population, RiskMeasure
from skfolio.datasets import load_sp500_dataset
from skfolio.moments import DenoiseCovariance, ShrunkMu
from skfolio.optimization import MeanRisk, ObjectiveFunction
from skfolio.preprocessing import prices_to_returns
from skfolio.prior import EmpiricalPrior

prices = load_sp500_dataset()
X = prices_to_returns(prices)
X_train, X_test = train_test_split(X, test_size=0.33, shuffle=False)

Model#

We create a Maximum Sharpe Ratio model with shrinkage for the estimation of the expected returns and denoising for the estimation of the covariance matrix:

model = MeanRisk(
    risk_measure=RiskMeasure.VARIANCE,
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    prior_estimator=EmpiricalPrior(
        mu_estimator=ShrunkMu(), covariance_estimator=DenoiseCovariance()
    ),
    portfolio_params=dict(name="Max Sharpe - ShrunkMu & DenoiseCovariance"),
)
model.fit(X_train)
model.weights_

array([5.29957573e-02, 5.62258517e-07, 2.19431588e-07, 5.78329525e-02,
       1.05704853e-01, 6.42635888e-07, 1.25145591e-02, 1.64813030e-01,
       3.95272190e-07, 8.40687639e-02, 1.20296728e-06, 1.33374542e-06,
       6.51482298e-02, 7.44911613e-02, 7.02327403e-06, 1.27177410e-01,
       3.87852705e-02, 6.81199403e-02, 4.34872838e-02, 1.04849409e-01])

Benchmark#

For comparison, we also create a Maximum Sharpe Ratio model using the default moments estimators:

bench = MeanRisk(
    risk_measure=RiskMeasure.VARIANCE,
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    portfolio_params=dict(name="Max Sharpe"),
)
bench.fit(X_train)
bench.weights_

array([9.43631399e-02, 1.13184579e-06, 5.04970598e-07, 1.20834667e-01,
       3.18126275e-02, 8.57806907e-07, 7.11596802e-04, 1.24104939e-01,
       9.49223801e-07, 2.77547553e-02, 1.23409042e-06, 1.37593860e-06,
       1.16299875e-01, 5.73516411e-02, 9.58498589e-06, 1.09493919e-01,
       8.64761638e-02, 1.83992252e-01, 1.32350165e-02, 3.35537683e-02])

Prediction#

We predict both models on the test set:

pred_model = model.predict(X_test)
pred_bench = bench.predict(X_test)

population = Population([pred_model, pred_bench])

fig = population.plot_cumulative_returns()
show(fig)

Total running time of the script: (0 minutes 2.174 seconds)

Gallery generated by Sphinx-Gallery