Prior Estimator#

A Prior Estimator in skfolio fits a ReturnDistribution containing your pre-optimization inputs (\(\mu\), \(\Sigma\), returns, sample weight, Cholesky decomposition).

The term “prior” is used in a general optimization sense, not confined to Bayesian priors. It denotes any a priori assumption or estimation method for the return distribution before optimization, unifying both Frequentist, Bayesian and Information-theoretic approaches into a single cohesive framework:

  1. Frequentist:
  2. Bayesian:
  3. Information-theoretic:

In skfolio’s API, all such methods share the same interface and adhere to scikit-learn’s estimator API: the fit method accepts X (the asset returns) and stores the resulting ReturnDistribution in its return_distribution_ attribute.

X can be any array-like structure (NumPy array, pandas DataFrame, etc.).

The ReturnDistribution is a dataclass containing:

  • mu: Estimated expected returns of shape (n_assets,)

  • covariance: Estimated covariance matrix of shape (n_assets, n_assets)

  • returns: (Estimated) asset returns of shape (n_observations, n_assets)

  • sample_weight : Sample weight for each observation of shape (n_observations,) (optional)

  • cholesky : Lower-triangular Cholesky factor of the covariance (optional)

Note

The posterior of one model can serve as the prior for another. In skfolio, Prior Estimators can be composed into complex pre-optimization pipelines. For example, BlackLitterman accepts a fitted Prior Estimator that computes initial expected returns and covariance, applies the analyst’s views to update them, and then stores the resulting posterior expected returns and covariance in a new ReturnDistribution, which can be passed into another estimator.

Empirical Prior#

The EmpiricalPrior estimator estimates the ReturnDistribution by fitting its mu_estimator and covariance_estimator independently.

Example:

An EmpiricalPrior configured with James–Stein shrinkage to estimate expected returns and a denoising method to estimate the covariance matrix:

from skfolio.datasets import load_sp500_dataset
from skfolio.moments import DenoiseCovariance, ShrunkMu
from skfolio.preprocessing import prices_to_returns
from skfolio.prior import EmpiricalPrior

prices = load_sp500_dataset()
X = prices_to_returns(prices)

model = EmpiricalPrior(
    mu_estimator=ShrunkMu(), covariance_estimator=DenoiseCovariance()
)
model.fit(X)
print(model.return_distribution_)

Black & Litterman#

The BlackLitterman estimator estimates the ReturnDistribution using the Black & Litterman model. It takes a Bayesian approach by starting from a prior estimate of the assets’ expected returns and covariance matrix, then updating them with the analyst’s views to obtain the posterior estimates.

Tutorials:

Example:

from skfolio.preprocessing import prices_to_returns
from skfolio.datasets import load_sp500_dataset
from skfolio.prior import BlackLitterman

prices = load_sp500_dataset()
X = prices_to_returns(prices)

analyst_views = [
    "AAPL - BBY == 0.0003",
    "CVX - KO == 0.0004",
    "MSFT == 0.0006",
]

model = BlackLitterman(views=analyst_views)
model.fit(X)
print(model.return_distribution_)

Factor Model#

The FactorModel estimator estimates the ReturnDistribution by fitting a factor model on asset returns alongside a specified prior estimator for the factor returns.

The purpose of factor models is to impose a structure on financial variables and their covariance matrix by explaining them through a small number of common factors. This can help overcome estimation error by reducing the number of parameters, i.e., the dimensionality of the estimation problem, making portfolio optimization more robust against noise in the data. Factor models also provide a decomposition of financial risk into systematic and security-specific components.

To be fully compatible with scikit-learn, the fit method takes X as the assets returns and y as the factors returns. Note that y is in lowercase even for a 2D array (more than one factor). This is for consistency with the scikit-learn API.

Tutorials:

Example:

from skfolio.datasets import load_factors_dataset, load_sp500_dataset
from skfolio.preprocessing import prices_to_returns
from skfolio.prior import FactorModel

prices = load_sp500_dataset()
factor_prices = load_factors_dataset()
X, y = prices_to_returns(prices, factor_prices)

model = FactorModel()
model.fit(X, y)
print(model.return_distribution_)

The loading matrix (betas) of the factors is estimated using a loading_matrix_estimator. By default, we use the LoadingMatrixRegression which fits the factors using a sklean.linear_model.LassoCV on each asset separately.

Synthetic Data#

The SyntheticData estimator bridges scenario generation and portfolio optimization. It estimates the ReturnDistribution by fitting a distribution_estimator and sampling new data from it.

The default distribution_estimator is a Regular VineCopula estimator. Other common choices are Generative Adversarial Networks (GANs) or Variational Autoencoders (VAEs).

It is particularly useful when the historical distribution tail dependencies are sparse and need extrapolation for tail optimizations or when optimizing under conditional or stressed scenarios.

Tutorials:

Example:

from skfolio.datasets import load_sp500_dataset, load_factors_dataset
from skfolio.preprocessing import prices_to_returns
from skfolio.distribution import VineCopula
from skfolio.optimization import MeanRisk
from skfolio.prior import FactorModel, SyntheticData
from skfolio import RiskMeasure

# Load historical prices and convert them to returns
prices = load_sp500_dataset()
X = prices_to_returns(prices, factors)

# Instanciate the SyntheticData model and fit it
model = SyntheticData()
model.fit(X)
print(model.return_distribution_)

# Minimum CVaR optimization on synthetic returns
vine = VineCopula(log_transform=True, n_jobs=-1)
prior = =SyntheticData(distribution_estimator=vine, n_samples=2000)
model = MeanRisk(risk_measure=RiskMeasure.CVAR, prior_estimator=prior)
model.fit(X)
print(model.weights_)

# Stress Test
vine = VineCopula(log_transform=True, central_assets=["BAC"]  n_jobs=-1)
vine.fit(X)
X_stressed = vine.sample(n_samples=10000, conditioning = {"BAC": -0.2})
ptf_stressed = model.predict(X_stressed)

Entropy Pooling#

EntropyPooling, introduced by Attilio Meucci in 2008 as a generalization of the Black-Litterman framework, is a nonparametric method for adjusting a baseline (“prior”) probability distribution to incorporate user-defined views by finding the posterior distribution closest to the prior while satisfying those views.

User-defined views can be elicited from domain experts or derived from quantitative analyses.

Grounded in information theory, it updates the distribution in the least-informative way by minimizing the Kullback-Leibler divergence (relative entropy) under the specified view constraints.

Tutorials:

Example:

from skfolio import RiskMeasure
from skfolio.datasets import load_sp500_dataset
from skfolio.preprocessing import prices_to_returns
from skfolio.prior import EntropyPooling
from skfolio.optimization import HierarchicalRiskParity

prices = load_sp500_dataset()
prices = prices[["AMD", "BAC", "GE", "JNJ", "JPM", "LLY", "PG"]]
X = prices_to_returns(prices)

groups = {
    "AMD": ["Technology", "Growth"],
    "BAC": ["Financials", "Value"],
    "GE": ["Industrials", "Value"],
    "JNJ": ["Healthcare", "Defensive"],
    "JPM": ["Financials", "Income"],
    "LLY": ["Healthcare", "Defensive"],
    "PG": ["Consumer", "Defensive"],
}

entropy_pooling = EntropyPooling(
    mean_views=[
        "JPM == -0.002",
        "PG >= LLY",
        "BAC >= prior(BAC) * 1.2",
        "Financials == 2 * Growth",
    ],
    variance_views=[
        "BAC == prior(BAC) * 4",
    ],
    correlation_views=[
        "(BAC,JPM) == 0.80",
        "(BAC,JNJ) <= prior(BAC,JNJ) * 0.5",
    ],
    skew_views=[
        "BAC == -0.05",
    ],
    cvar_views=[
        "GE == 0.08",
    ],
    cvar_beta=0.90,
    groups=groups,
)

entropy_pooling.fit(X)

print(entropy_pooling.relative_entropy_)
print(entropy_pooling.effective_number_of_scenarios_)
print(entropy_pooling.return_distribution_.sample_weight)

# CVaR Hierarchical Risk Parity optimization on Entropy Pooling
model = HierarchicalRiskParity(
    risk_measure=RiskMeasure.CVAR,
    prior_estimator=entropy_pooling
)
model.fit(X)
print(model.weights_)

# Stress Test the Portfolio
entropy_pooling = EntropyPooling(cvar_views=["AMD == 0.10"])
entropy_pooling.fit(X)

stressed_dist = entropy_pooling.return_distribution_

stressed_ptf = model.predict(stressed_dist)

Opinion Pooling#

OpinionPooling (also called Belief Aggregation or Risk Aggregation) is a process in which different probability distributions (opinions), produced by different experts, are combined to yield a single probability distribution (consensus).

Expert opinions (also called individual prior distributions) can be elicited from domain experts or derived from quantitative analyses.

The OpinionPooling estimator takes a list of prior estimators, each of which produces scenario probabilities (sample_weight), and pools them into a single consensus probability .

You can choose between linear (arithmetic) pooling or logarithmic (geometric) pooling, and optionally apply robust pooling using a Kullback-Leibler divergence penalty to down-weight experts whose views deviate strongly from the group consensus.

Tutorials:

Example:

from skfolio import RiskMeasure
from skfolio.datasets import load_sp500_dataset
from skfolio.preprocessing import prices_to_returns
from skfolio.prior import EntropyPooling, OpinionPooling
from skfolio.optimization import RiskBudgeting

prices = load_sp500_dataset()
X = prices_to_returns(prices)

# We consider two expert opinions, each generated via Entropy Pooling with
# user-defined views.
# We assign probabilities of 40% to Expert 1, 50% to Expert 2, and by default
# the remaining 10% is allocated to the prior distribution:
opinion_1 = EntropyPooling(cvar_views=["AMD == 0.10"])
opinion_2 = EntropyPooling(
    mean_views=["AMD >= BAC", "JPM <= prior(JPM) * 0.8"],
    cvar_views=["GE == 0.12"],
)

opinion_pooling = OpinionPooling(
    estimators=[("opinion_1", opinion_1), ("opinion_2", opinion_2)],
    opinion_probabilities=[0.4, 0.5],
)

opinion_pooling.fit(X)

print(opinion_pooling.return_distribution_.sample_weight)

# CVaR Risk Parity optimization on opinion Pooling
model = RiskBudgeting(
    risk_measure=RiskMeasure.CVAR,
    prior_estimator=opinion_pooling
)
model.fit(X)
print(model.weights_)

# Stress Test the Portfolio
opinion_1 = EntropyPooling(cvar_views=["AMD == 0.05"])
opinion_2 = EntropyPooling(cvar_views=["AMD == 0.10"])
opinion_pooling = OpinionPooling(
    estimators=[("opinion_1", opinion_1), ("opinion_2", opinion_2)],
    opinion_probabilities=[0.6, 0.4],
)
opinion_pooling.fit(X)

stressed_dist = opinion_pooling.return_distribution_

stressed_ptf = model.predict(stressed_dist)

Combining Multiple Prior Estimators#

Prior estimators can be composed to build more sophisticated models. For example, you can create a Black & Litterman Factor Model by supplying BlackLitterman as the prior estimator of the FactorModel and impose views on the factors.

Example:

Below is a factor model that estimates the assets’ expected returns and covariance matrix, where the factors’ expected returns and covariance are themselves estimated via a Black & Litterman model that incorporates the analyst’s views on those factors.

from skfolio.datasets import load_factors_dataset, load_sp500_dataset
from skfolio.preprocessing import prices_to_returns
from skfolio.prior import BlackLitterman, FactorModel

prices = load_sp500_dataset()
factor_prices = load_factors_dataset()
X, y = prices_to_returns(prices, factor_prices)

views = [
    "MTUM - QUAL == 0.0003",
    "SIZE - USMV == 0.0004",
    "VLUE == 0.0006",
]

model = FactorModel(
    factor_prior_estimator=BlackLitterman(views=views),
)
model.fit(X, y)
print(model.return_distribution_)

Example:

By combining SyntheticData with FactorModel you can generate synthetic data of your factors then project them to your assets. This is often used for factor stress test.

from skfolio.datasets import load_sp500_dataset, load_factors_dataset
from skfolio.preprocessing import prices_to_returns
from skfolio.distribution import VineCopula
from skfolio.optimization import MeanRisk
from skfolio.prior import FactorModel, SyntheticData
from skfolio import RiskMeasure

# Load historical prices and convert them to returns
prices = load_sp500_dataset()
factors = load_factors_dataset()
X, y = prices_to_returns(prices, factors)


# Minimum CVaR optimization on Stressed Factors
vine = VineCopula(central_assets=["QUAL"], log_transform=True, n_jobs=-1)
factor_prior = SyntheticData(
    distribution_estimator=vine,
    n_samples=10000,
    sample_args=dict(conditioning={"QUAL": -0.2}),
)
factor_model = FactorModel(factor_prior_estimator=factor_prior)
model = MeanRisk(risk_measure=RiskMeasure.CVAR, prior_estimator=factor_model)
model.fit(X, y)
print(model.weights_)

# Stress Test the Portfolio
factor_model.set_params(factor_prior_estimator__sample_args=dict(
    conditioning={"QUAL": -0.5}
))
factor_model.fit(X,y)
stressed_dist = factor_model.return_distribution_
stressed_ptf = model.predict(stressed_dist)

Example:

To impose extreme views using Entropy Pooling on a sparse historical distribution, we must generate synthetic data capable of extrapolating tail dependencies. This can be achieved by combining EntropyPooling with SyntheticData:

from skfolio.datasets import load_sp500_dataset
from skfolio.preprocessing import prices_to_returns
from skfolio.distribution import VineCopula
from skfolio.prior import EntropyPooling, SyntheticData

# Load historical prices and convert them to returns
prices = load_sp500_dataset()
X = prices_to_returns(prices)

# Regular Vine Copula and sampling of 100,000 synthetic returns
synth = SyntheticData(
    n_samples=100_000,
    distribution_estimator=VineCopula(log_transform=True, n_jobs=-1, random_state=0)
)

# Entropy Pooling by imposing a CVaR-95% of 10% on Apple
entropy_pooling = EntropyPooling(
    prior_estimator=factor_synth,
    cvar_views=["AAPL == 0.10"],
)

entropy_pooling.fit(X)

Example:

Instead of applying extreme Entropy Pooling views directly to asset returns, we can embed it within a Factor Model. This allows us to impose views on factor data such at the quality factor “QUAL”. This can be achieved by combining EntropyPooling with SyntheticData and with FactorModel:

from skfolio.datasets import load_sp500_dataset, load_factors_dataset
from skfolio.preprocessing import prices_to_returns
from skfolio.distribution import VineCopula
from skfolio.optimization import MeanRisk
from skfolio.prior import FactorModel, SyntheticData
from skfolio import RiskMeasure

# Load historical prices and convert them to returns
prices = load_sp500_dataset()
factor_prices = load_factors_dataset()
X, factors = prices_to_returns(prices, factor_prices)

# Regular Vine Copula and sampling of 100,000 synthetic factor returns
factor_synth = SyntheticData(
    n_samples=100_000,
    distribution_estimator=VineCopula(log_transform=True, n_jobs=-1, random_state=0)
)

# Entropy Pooling by imposing a CVaR-95% of 10% on the Quality factor
factor_entropy_pooling = EntropyPooling(
    prior_estimator=factor_synth,
    cvar_views=["QUAL == 0.10"],
)

factor_entropy_pooling.fit(X, factors)