Note

Go to the end to download the full example code. or to run this example in your browser via JupyterLite or Binder

Opinion Pooling#

This tutorial introduces the OpinionPooling estimator.

Introduction#

Opinion Pooling (also called Belief Aggregation or Risk Aggregation) is a process in which different probability distributions (opinions), produced by different experts, are combined to yield a single probability distribution (consensus).

Expert opinions (also called individual prior distributions) can be elicited from domain experts or derived from quantitative analyses.

The OpinionPooling estimator takes a list of prior estimators, each of which produces scenario probabilities (sample_weight), and pools them into a single consensus probability .

You can choose between linear (arithmetic) pooling or logarithmic (geometric) pooling, and optionally apply robust pooling using a Kullback-Leibler divergence penalty to down-weight experts whose views deviate strongly from the group.

Linear Opinion Pooling#

Retains all nonzero support: no “zero-forcing”
Produces an averaging that is more evenly spread across all expert opinions.

Logarithmic Opinion Pooling#

Zero-Preservation: any scenario assigned zero probability by any expert remains zero in the aggregate.
Information-Theoretic Optimality: yields the distribution that minimizes the weighted sum of KL-divergences from each expert’s distribution.
Robust to Extremes: down-weight extreme or contrarian views more severely.

Robust Pooling with Divergence Penalty#

By specifying a divergence_penalty, you can penalize each opinion’s divergence from the group consensus, yielding a more robust aggregate distribution.

In this tutorial, we will:

Apply Opinion Pooling to historical return data.
Construct portfolios based on the adjusted distribution.
Demonstrate factor-based and synthetic-data-enhanced Opinion Pooling.
Perform stress tests using Opinion Pooling.

Data Loading and Preparation#

We load the S&P 500 dataset and select seven stocks (for demonstration purposes). We also load the factors dataset, composed of daily prices for five ETFs representing common factors.

import numpy as np
import pandas as pd
from plotly.io import show

from skfolio import Population, RiskMeasure
from skfolio.datasets import load_factors_dataset, load_sp500_dataset
from skfolio.distribution import VineCopula
from skfolio.measures import (
    cvar,
    kurtosis,
    mean,
    skew,
    standard_deviation,
    value_at_risk,
)
from skfolio.optimization import HierarchicalRiskParity, RiskBudgeting
from skfolio.preprocessing import prices_to_returns
from skfolio.prior import EntropyPooling, FactorModel, OpinionPooling, SyntheticData
from skfolio.utils.figure import plot_kde_distributions

# Load stock price and factor data
prices = load_sp500_dataset()
prices = prices[["AMD", "BAC", "GE", "JNJ", "JPM", "LLY", "PG"]]
factor_prices = load_factors_dataset()

# Convert to daily returns
X, factors = prices_to_returns(prices, factor_prices)

print("Shapes:")
print(f"X: {X.shape}")
print(f"factors: {factors.shape}")

print(X.tail())
print(factors.tail())

Shapes:
X: (2263, 7)
factors: (2263, 5)
                 AMD       BAC        GE  ...       JPM       LLY        PG
Date                                      ...
2022-12-21  0.040430  0.015223  0.033001  ...  0.011248  0.023275  0.009170
2022-12-22 -0.056442 -0.008848 -0.014582  ... -0.011355 -0.007339  0.002308
2022-12-23  0.010335  0.002443  0.000235  ...  0.004749  0.007090  0.002825
2022-12-27 -0.019374  0.001875  0.012849  ...  0.003504 -0.008208  0.008713
2022-12-28 -0.011064  0.007360 -0.010502  ...  0.005463  0.000932 -0.012926

[5 rows x 7 columns]
                MTUM      QUAL      SIZE      USMV      VLUE
Date
2022-12-21  0.014312  0.017884  0.014371  0.012005  0.013246
2022-12-22 -0.010977 -0.015411 -0.012070 -0.007315 -0.011989
2022-12-23  0.010897  0.005889  0.006287  0.005281  0.005844
2022-12-27  0.001770 -0.003138 -0.001320  0.001798 -0.000111
2022-12-28 -0.011778 -0.013325 -0.013914 -0.010489 -0.015238

Summary Statistics#

We create a helper function to compute key return statistics, optionally weighted by sample probabilities:

def summary(X: pd.DataFrame, sample_weight: np.ndarray | None = None) -> pd.DataFrame:
    return pd.DataFrame(
        {
            "Mean": mean(X, sample_weight=sample_weight),
            "Volatility": standard_deviation(X, sample_weight=sample_weight),
            "Skew": skew(X, sample_weight=sample_weight),
            "Kurtosis": kurtosis(X, sample_weight=sample_weight),
            "VaR at 95%": value_at_risk(X, beta=0.95, sample_weight=sample_weight),
            "CVaR at 95%": cvar(X, beta=0.95, sample_weight=sample_weight),
        }
    )


summary(X)

	Mean	Volatility	Skew	Kurtosis	VaR at 95%	CVaR at 95%
AMD	0.001902	0.037314	1.323839	22.604470	0.052970	0.078580
BAC	0.000581	0.019822	0.283198	13.312827	0.028821	0.044680
GE	-0.000099	0.021970	0.179690	9.851668	0.032739	0.051328
JNJ	0.000466	0.011443	-0.261929	12.610478	0.016330	0.027007
JPM	0.000621	0.017354	0.343518	17.000055	0.025351	0.038524
LLY	0.001102	0.016708	0.910982	14.984163	0.022783	0.034948
PG	0.000465	0.011699	0.261102	16.276273	0.016019	0.027307

Expert Opinions#

We consider two expert opinions, each generated via Entropy Pooling with user-defined views. We assign probabilities of 40% to Expert 1, 50% to Expert 2, and by default the remaining 10% is allocated to the prior distribution:

opinion_1 = EntropyPooling(cvar_views=["AMD == 0.10"])

opinion_2 = EntropyPooling(
    mean_views=["AMD >= BAC", "JPM <= prior(JPM) * 0.8"],
    cvar_views=["GE == 0.12"],
)

opinion_pooling = OpinionPooling(
    estimators=[("opinion_1", opinion_1), ("opinion_2", opinion_2)],
    opinion_probabilities=[0.4, 0.5],
)

opinion_pooling.fit(X)

sample_weight = opinion_pooling.return_distribution_.sample_weight
summary(X, sample_weight=sample_weight)

	Mean	Volatility	Skew	Kurtosis	VaR at 95%	CVaR at 95%
AMD	-0.000760	0.041141	0.456986	17.707707	0.062590	0.104290
BAC	-0.001752	0.026380	-2.124740	17.464876	0.034702	0.078480
GE	-0.002815	0.028597	-1.692467	12.656108	0.040912	0.089127
JNJ	-0.000321	0.012897	-0.908678	11.038584	0.019654	0.036289
JPM	-0.001571	0.024030	-2.501122	20.897790	0.029561	0.071314
LLY	0.000002	0.018725	-0.131088	13.343008	0.025866	0.047965
PG	-0.000366	0.013508	-0.847842	15.226395	0.018663	0.037963

Let’s plot the prior versus the posterior returns distributions for each asset:

plot_kde_distributions(
    X,
    sample_weight=sample_weight,
    percentile_cutoff=0.05,
    title="Distribution of Asset Returns (Prior vs. Posterior)",
    unweighted_suffix="Prior",
    weighted_suffix="Posterior",
)

Building a Portfolio based on Opinion Pooling#

Now that we’ve shown how the Opinion Pooling estimator works in isolation, let’s see how to implement a risk parity portfolio with CVaR-90% as the risk measure based on Opinion Pooling:

model = RiskBudgeting(
    risk_measure=RiskMeasure.CVAR, cvar_beta=0.9, prior_estimator=opinion_pooling
)

model.fit(X)

print(model.weights_)

[0.0808561  0.09789884 0.0972093  0.21843296 0.10682522 0.17225747
 0.22652011]

Factor Opinion Pooling#

Instead of applying Opinion Pooling directly to asset returns, we can embed it within a factor model so that expert views are expressed on the factors.

factor_opinion_1 = EntropyPooling(
    mean_views=["QUAL == -0.0005"], cvar_views=["SIZE == 0.08"]
)
factor_opinion_2 = EntropyPooling(cvar_views=["SIZE == 0.09"])

factor_opinion_pooling = OpinionPooling(
    estimators=[("opinion_1", factor_opinion_1), ("opinion_2", factor_opinion_2)],
    opinion_probabilities=[0.6, 0.4],
)

factor_model = FactorModel(factor_prior_estimator=factor_opinion_pooling)

model = RiskBudgeting(risk_measure=RiskMeasure.CVAR, prior_estimator=factor_model)


model.fit(X, factors)
print(model.weights_)

sample_weight = model.prior_estimator_.return_distribution_.sample_weight
summary(factors, sample_weight)

[0.09333252 0.09726895 0.10925494 0.21357264 0.10861864 0.176453
 0.20149932]

	Mean	Volatility	Skew	Kurtosis	VaR at 95%	CVaR at 95%
MTUM	-0.001654	0.022502	-3.032968	18.744646	0.027900	0.077008
QUAL	-0.001331	0.019848	-2.621010	16.934030	0.025156	0.068022
SIZE	-0.002226	0.023698	-3.627090	21.842691	0.025651	0.084100
USMV	-0.001495	0.018384	-3.306216	20.498260	0.020091	0.064411
VLUE	-0.002084	0.023367	-3.237838	19.459370	0.026067	0.081470

Factor Opinion Pooling on Synthetic Data#

Rather than applying Option Pooling directly to a limited historical factor prior, we generate 100,000 synthetic factor returns using a Vine Copula. This synthetic dataset extrapolate the tail dependencies and allows more extreme EP views that were infeasible with sparse historical data:

vine = VineCopula(log_transform=True, n_jobs=-1, random_state=0)

factor_synth = SyntheticData(n_samples=100_000, distribution_estimator=vine)

factor_opinion_1 = EntropyPooling(cvar_views=["SIZE == 0.15"])
factor_opinion_2 = EntropyPooling(cvar_views=["SIZE == 0.20"])

factor_opinion_pooling = OpinionPooling(
    prior_estimator=factor_synth,
    estimators=[("opinion_1", factor_opinion_1), ("opinion_2", factor_opinion_2)],
    opinion_probabilities=[0.6, 0.4],
)

factor_model = FactorModel(factor_prior_estimator=factor_opinion_pooling)

model = HierarchicalRiskParity(
    risk_measure=RiskMeasure.CVAR, prior_estimator=factor_model
)

model.fit(X, factors)
print(model.weights_)

[0.04468221 0.08506756 0.09396145 0.17199215 0.09416958 0.13640712
 0.37371993]

Following scikit-learn conventions, all fitted attributes end with a trailing underscore. You can inspect each model step-by-step by drilling into these attributes:

fitted_vine = model.prior_estimator_.factor_prior_estimator_.prior_estimator_.distribution_estimator_

Stress Test#

Having demonstrated ex-ante Opinion Pooling (optimizing a portfolio based on specific views), we now apply ex-post Opinion Pooling to stress-test an existing portfolio. We start with a Hierarchical Risk Parity (HRP) portfolio using CVaR as the risk measure, optimized on historical data without Opinion Pooling:

model = HierarchicalRiskParity(risk_measure=RiskMeasure.CVAR)

model.fit(X)
print(model.weights_)

portfolio = model.predict(X)
portfolio.name = "HRP Unstressed"

# Add to a Population for better comparison with the stressed portfolios.
population = Population([portfolio])

[0.06021919 0.1059091  0.08763318 0.18841288 0.11675892 0.25472467
 0.18634206]

Create a Stressed Distribution#

Let’s use Opinion Pooling on synthetic data by pooling two distinct expert views on AMD’s CVaR:

vine = VineCopula(log_transform=True, n_jobs=-1, random_state=0)

synth = SyntheticData(n_samples=100_000, distribution_estimator=vine)

opinion_1 = EntropyPooling(cvar_beta=0.90, cvar_views=["AMD == 0.08"])
opinion_2 = EntropyPooling(cvar_views=["AMD == 0.10"])

opinion_pooling = OpinionPooling(
    prior_estimator=synth,
    estimators=[("opinion_1", opinion_1), ("opinion_2", opinion_2)],
    opinion_probabilities=[0.6, 0.4],
)

opinion_pooling.fit(X)

# We retrieve the stressed distribution:
stressed_dist = opinion_pooling.return_distribution_

# We stress-test our portfolio:
stressed_ptf = model.predict(stressed_dist)

# Add the stressed portfolio to the population
stressed_ptf.name = "HRP Stressed"
population.append(stressed_ptf)

Now let’s apply Factor Opinion Pooling to synthetic factor data by specifying two expert views on the CVaR of the quality factor (QUAL):

factor_synth = SyntheticData(n_samples=100_000, distribution_estimator=vine)

factor_opinion_1 = EntropyPooling(cvar_beta=0.90, cvar_views=["QUAL == 0.10"])
factor_opinion_2 = EntropyPooling(cvar_views=["QUAL == 0.12"])

factor_opinion_pooling = OpinionPooling(
    prior_estimator=factor_synth,
    estimators=[("opinion_1", factor_opinion_1), ("opinion_2", factor_opinion_2)],
    opinion_probabilities=[0.6, 0.4],
)

factor_model = FactorModel(factor_prior_estimator=factor_opinion_pooling)

factor_model.fit(X, factors)

# We retrieve the stressed distribution:
stressed_dist = factor_model.return_distribution_

# We stress-test our portfolio:
stressed_ptf = model.predict(stressed_dist)

# Add the stressed portfolio to the population
stressed_ptf.name = "HRP Factor Stressed"
population.append(stressed_ptf)

Analysis of Unstressed vs Stressed Portfolios#

pop_summary = population.summary()
pop_summary.loc[
    [
        "Mean",
        "Standard Deviation",
        "CVaR at 95%",
        "Annualized Sharpe Ratio",
        "Worst Realization",
    ]
]

	HRP Unstressed	HRP Stressed	HRP Factor Stressed
Mean	0.070%	0.034%	-0.45%
Standard Deviation	1.14%	1.24%	2.78%
CVaR at 95%	2.64%	2.94%	10.38%
Annualized Sharpe Ratio	0.96	0.44	-2.59
Worst Realization	9.16%	23.95%	17.16%

fig = population.plot_returns_distribution(percentile_cutoff=0.05)
show(fig)

Conclusion#

In this tutorial, we demonstrated how to leverage Opinion Pooling to aggregate multiple expert views into every stage of portfolio management, from ex-ante optimization to ex-post stress testing.

References#

[1] “Probabilistic opinion pooling generalized”,: Social Choice and Welfare, Dietrich & List (2017)
[2] “Opinion Aggregation and Individual Expertise”,: Oxford University Press, Martini & Sprenger (2017)
[3] “Rational Decisions”,: Journal of the Royal Statistical Society, Good (1952)

Total running time of the script: (0 minutes 28.686 seconds)

Gallery generated by Sphinx-Gallery