Prior Estimator#

A prior estimator fits a PriorModel containing the distribution estimate of asset returns. It represents the investor’s prior beliefs about the model used to estimate that distribution.

A prior estimator follows the same API as scikit-learn’s estimator: the fit method takes X as the assets returns and stores the PriorModel in its prior_model_ attribute.

X can be any array-like structure (numpy array, pandas DataFrame, etc.)


The prior of one model can be the posterior of another one. For example, BlackLitterman takes as input a prior estimator used to compute the prior expected returns and prior covariance matrix, which are updated using the analyst’s views to get the posterior expected returns and posterior covariance matrix. These posterior estimates will be saved in a new PriorModel that can be used in another estimator.

The PriorModel is a dataclass containing:

  • mu: Expected returns estimation

  • covariance: Covariance matrix estimation

  • returns: assets returns estimation

  • cholesky : Lower-triangular Cholesky factor of the covariance estimation (optional)

Empirical Prior#

The EmpiricalPrior estimator estimates the PriorModel by fitting a mu_estimator and a covariance_estimator separately.


Empirical prior with James-Stein shrinkage for the estimation of expected returns and Denoising for the estimation of the covariance matrix:

from skfolio.datasets import load_sp500_dataset
from skfolio.moments import DenoiseCovariance, ShrunkMu
from skfolio.preprocessing import prices_to_returns
from skfolio.prior import EmpiricalPrior

prices = load_sp500_dataset()
X = prices_to_returns(prices)

model = EmpiricalPrior(
    mu_estimator=ShrunkMu(), covariance_estimator=DenoiseCovariance()

Black & Litterman#

The BlackLitterman estimator estimates the PriorModel using the Black & Litterman model. It takes a Bayesian approach by using a prior estimate of the assets expected returns and covariance matrix, which are updated using the analyst views to get the posterior estimates.


from skfolio.preprocessing import prices_to_returns
from skfolio.datasets import load_sp500_dataset
from skfolio.prior import BlackLitterman

prices = load_sp500_dataset()
X = prices_to_returns(prices)

analyst_views = [
    "AAPL - BBY == 0.0003",
    "CVX - KO == 0.0004",
    "MSFT == 0.0006",

model = BlackLitterman(views=analyst_views)

Factor Model#

The FactorModel estimator estimates the PriorModel using a factor model and a prior estimator of the factor’s returns.

The purpose of factor models is to impose a structure on financial variables and their covariance matrix by explaining them through a small number of common factors. This can help overcome estimation error by reducing the number of parameters, i.e., the dimensionality of the estimation problem, making portfolio optimization more robust against noise in the data. Factor models also provide a decomposition of financial risk into systematic and security-specific components.

To be fully compatible with scikit-learn, the fit method takes X as the assets returns and y as the factors returns. Note that y is in lowercase even for a 2D array (more than one factor). This is for consistency with the scikit-learn API.


from skfolio.datasets import load_factors_dataset, load_sp500_dataset
from skfolio.preprocessing import prices_to_returns
from skfolio.prior import FactorModel

prices = load_sp500_dataset()
factor_prices = load_factors_dataset()
X, y = prices_to_returns(prices, factor_prices)

model = FactorModel(), y)

The loading matrix (betas) of the factors is estimated using a loading_matrix_estimator. By default, we use the LoadingMatrixRegression which fits the factors using a sklean.linear_model.LassoCV on each asset separately.

Synthetic Data#

The SyntheticData estimator bridges scenario generation and portfolio optimization. It estimates the PriorModel by fitting a distribution_estimator and sampling new data from it.

The default distribution_estimator is a Regular VineCopula estimator. Other common choices are Generative Adversarial Networks (GANs) or Variational Autoencoders (VAEs).

It is particularly useful when the historical distribution tail dependencies are sparse and need extrapolation for tail optimizations or when optimizing under conditional or stressed scenarios.

Detailed tutorials:


from skfolio.datasets import load_sp500_dataset, load_factors_dataset
from skfolio.preprocessing import prices_to_returns
from skfolio.distribution import VineCopula
from skfolio.optimization import MeanRisk
from skfolio.prior import FactorModel, SyntheticData
from skfolio import RiskMeasure

# Load historical prices and convert them to returns
prices = load_sp500_dataset()
X = prices_to_returns(prices, factors)

# Instanciate the SyntheticData model and fit it
model = SyntheticData()

# Minimum CVaR optimization on synthetic returns
vine = VineCopula(log_transform=True, n_jobs=-1)
prior = =SyntheticData(distribution_estimator=vine, n_samples=2000)
model = MeanRisk(risk_measure=RiskMeasure.CVAR, prior_estimator=prior)

# Stress Test
vine = VineCopula(log_transform=True, central_assets=["BAC"]  n_jobs=-1)
X_stressed = vine.sample(n_samples=10000, conditioning = {"BAC": -0.2})
ptf_stressed = model.predict(X_stressed)

Combining Multiple Prior Estimators#

Prior estimators can be combined. For example, it is possible to create a Black & Litterman Factor Model by using a BlackLitterman estimator for the prior estimator of the FactorModel:


Factor model for the estimation of the assets expected returns and covariance matrix with a Black & Litterman model for the estimation of the factors expected reruns and covariance matrix, incorporating the analyst views on the factors.

from skfolio.datasets import load_factors_dataset, load_sp500_dataset
from skfolio.preprocessing import prices_to_returns
from skfolio.prior import BlackLitterman, FactorModel

prices = load_sp500_dataset()
factor_prices = load_factors_dataset()
X, y = prices_to_returns(prices, factor_prices)

views = [
    "MTUM - QUAL == 0.0003",
    "SIZE - USMV == 0.0004",
    "VLUE == 0.0006",

model = FactorModel(
), y)


By combining SyntheticData with FactorModel you can generate synthetic data of your factors then project them to your assets. This is often used for factor stress test.

from skfolio.datasets import load_sp500_dataset, load_factors_dataset
from skfolio.preprocessing import prices_to_returns
from skfolio.distribution import VineCopula
from skfolio.optimization import MeanRisk
from skfolio.prior import FactorModel, SyntheticData
from skfolio import RiskMeasure

# Load historical prices and convert them to returns
prices = load_sp500_dataset()
factors = load_factors_dataset()
X, y = prices_to_returns(prices, factors)

# Minimum CVaR optimization on Stressed Factors
vine = VineCopula(central_assets=["QUAL"], log_transform=True, n_jobs=-1)
factor_prior = SyntheticData(
    sample_args=dict(conditioning={"QUAL": -0.2}),
factor_model = FactorModel(factor_prior_estimator=factor_prior)
model = MeanRisk(risk_measure=RiskMeasure.CVAR, prior_estimator=factor_model), y)

# Stress Test the Portfolio
    conditioning={"QUAL": -0.5}
stressed_X = factor_model.prior_model_.returns
stressed_ptf = model.predict(stressed_X)