skfolio#

skfolio is a Python library for portfolio optimization built on top of scikit-learn. It offers a unified interface and tools compatible with scikit-learn to build, fine-tune, and cross-validate portfolio models.

It is distributed under the open-source 3-Clause BSD license.

examples

Installation#

skfolio is available on PyPI and can be installed with:

$ pip install skfolio

Key Concepts#

Since the development of modern portfolio theory by Markowitz (1952), mean-variance optimization (MVO) has received considerable attention.

Unfortunately, it faces a number of shortcomings, including high sensitivity to the input parameters (expected returns and covariance), weight concentration, high turnover, and poor out-of-sample performance.

It is well-known that naive allocation (1/N, inverse-vol, etc.) tends to outperform MVO out-of-sample (DeMiguel, 2007).

Numerous approaches have been developed to alleviate these shortcomings (shrinkage, additional constraints, regularization, uncertainty set, higher moments, Bayesian approaches, coherent risk measures, left-tail risk optimization, distributionally robust optimization, factor model, risk-parity, hierarchical clustering, ensemble methods, pre-selection, etc.).

Given the large number of methods, and the fact that they can be combined, there is a need for a unified framework with a machine-learning approach to perform model selection, validation, and parameter tuning while mitigating the risk of data leakage and overfitting.

This framework is built on scikit-learn’s API.

Available models#

  • Portfolio Optimization:
    • Naive:
      • Equal-Weighted

      • Inverse-Volatility

      • Random (Dirichlet)

    • Convex:
      • Mean-Risk

      • Risk Budgeting

      • Maximum Diversification

      • Distributionally Robust CVaR

    • Clustering:
      • Hierarchical Risk Parity

      • Hierarchical Equal Risk Contribution

      • Nested Clusters Optimization

    • Ensemble Methods:
      • Stacking Optimization

  • Expected Returns Estimator:
    • Empirical

    • Exponentially Weighted

    • Equilibrium

    • Shrinkage

  • Covariance Estimator:
    • Empirical

    • Gerber

    • Denoising

    • Detoning

    • Exponentially Weighted

    • Ledoit-Wolf

    • Oracle Approximating Shrinkage

    • Shrunk Covariance

    • Graphical Lasso CV

    • Implied Covariance

  • Distance Estimator:
    • Pearson Distance

    • Kendall Distance

    • Spearman Distance

    • Covariance Distance (based on any of the above covariance estimators)

    • Distance Correlation

    • Variation of Information

  • Prior Estimator:
    • Empirical

    • Black & Litterman

    • Factor Model

  • Uncertainty Set Estimator:
    • On Expected Returns:
      • Empirical

      • Circular Bootstrap

    • On Covariance:
      • Empirical

      • Circular Bootstrap

  • Pre-Selection Transformer:
    • Non-Dominated Selection

    • Select K Extremes (Best or Worst)

    • Drop Highly Correlated Assets

    • Select Non-Expiring Assets

    • Select Complete Assets (handle late inception, delisting, etc.)

  • Cross-Validation and Model Selection:
    • Compatible with all sklearn methods (KFold, etc.)

    • Walk Forward

    • Combinatorial Purged Cross-Validation

  • Hyper-Parameter Tuning:
    • Compatible with all sklearn methods (GridSearchCV, RandomizedSearchCV)

  • Risk Measures:
    • Variance

    • Semi-Variance

    • Mean Absolute Deviation

    • First Lower Partial Moment

    • CVaR (Conditional Value at Risk)

    • EVaR (Entropic Value at Risk)

    • Worst Realization

    • CDaR (Conditional Drawdown at Risk)

    • Maximum Drawdown

    • Average Drawdown

    • EDaR (Entropic Drawdown at Risk)

    • Ulcer Index

    • Gini Mean Difference

    • Value at Risk

    • Drawdown at Risk

    • Entropic Risk Measure

    • Fourth Central Moment

    • Fourth Lower Partial Moment

    • Skew

    • Kurtosis

  • Optimization Features:
    • Minimize Risk

    • Maximize Returns

    • Maximize Utility

    • Maximize Ratio

    • Transaction Costs

    • Management Fees

    • L1 and L2 Regularization

    • Weight Constraints

    • Group Constraints

    • Budget Constraints

    • Tracking Error Constraints

    • Turnover Constraints

    • Cardinality and Group Cardinality Constraints

    • Threshold (Long and Short) Constraints

Quickstart#

The code snippets below are designed to introduce the functionality of skfolio so you can start using it quickly. It follows the same API as scikit-learn.

For more detailed information see the Examples, User Guide and API Reference .

Imports#

from sklearn import set_config
from sklearn.model_selection import (
    GridSearchCV,
    KFold,
    RandomizedSearchCV,
    train_test_split,
)
from sklearn.pipeline import Pipeline
from scipy.stats import loguniform

from skfolio import RatioMeasure, RiskMeasure
from skfolio.datasets import load_factors_dataset, load_sp500_dataset
from skfolio.model_selection import (
    CombinatorialPurgedCV,
    WalkForward,
    cross_val_predict,
)
from skfolio.moments import (
    DenoiseCovariance,
    DetoneCovariance,
    EWMu,
    GerberCovariance,
    ShrunkMu,
)
from skfolio.optimization import (
    MeanRisk,
    NestedClustersOptimization,
    ObjectiveFunction,
    RiskBudgeting,
)
from skfolio.pre_selection import SelectKExtremes
from skfolio.preprocessing import prices_to_returns
from skfolio.prior import BlackLitterman, EmpiricalPrior, FactorModel
from skfolio.uncertainty_set import BootstrapMuUncertaintySet

Load Dataset#

prices = load_sp500_dataset()

Train/Test split#

X = prices_to_returns(prices)
X_train, X_test = train_test_split(X, test_size=0.33, shuffle=False)

Minimum Variance#

model = MeanRisk()

Fit on training set#

model.fit(X_train)

print(model.weights_)

Predict on test set#

portfolio = model.predict(X_test)

print(portfolio.annualized_sharpe_ratio)
print(portfolio.summary())

Maximum Sortino Ratio#

model = MeanRisk(
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    risk_measure=RiskMeasure.SEMI_VARIANCE,
)

Denoised Covariance & Shrunk Expected Returns#

model = MeanRisk(
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    prior_estimator=EmpiricalPrior(
        mu_estimator=ShrunkMu(), covariance_estimator=DenoiseCovariance()
    ),
)

Uncertainty Set on Expected Returns#

model = MeanRisk(
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    mu_uncertainty_set_estimator=BootstrapMuUncertaintySet(),
)

Weight Constraints & Transaction Costs#

model = MeanRisk(
    min_weights={"AAPL": 0.10, "JPM": 0.05},
    max_weights=0.8,
    transaction_costs={"AAPL": 0.0001, "RRC": 0.0002},
    groups=[
        ["Equity"] * 3 + ["Fund"] * 5 + ["Bond"] * 12,
        ["US"] * 2 + ["Europe"] * 8 + ["Japan"] * 10,
    ],
    linear_constraints=[
        "Equity <= 0.5 * Bond",
        "US >= 0.1",
        "Europe >= 0.5 * Fund",
        "Japan <= 1",
    ],
)
model.fit(X_train)

Risk Parity on CVaR#

model = RiskBudgeting(risk_measure=RiskMeasure.CVAR)

Risk Parity & Gerber Covariance#

model = RiskBudgeting(
    prior_estimator=EmpiricalPrior(covariance_estimator=GerberCovariance())
)

Nested Cluster Optimization with Cross-Validation and Parallelization#

model = NestedClustersOptimization(
    inner_estimator=MeanRisk(risk_measure=RiskMeasure.CVAR),
    outer_estimator=RiskBudgeting(risk_measure=RiskMeasure.VARIANCE),
    cv=KFold(),
    n_jobs=-1,
)

Randomized Search of the L2 Norm#

randomized_search = RandomizedSearchCV(
    estimator=MeanRisk(),
    cv=WalkForward(train_size=252, test_size=60),
    param_distributions={
        "l2_coef": loguniform(1e-3, 1e-1),
    },
)
randomized_search.fit(X_train)

best_model = randomized_search.best_estimator_

print(best_model.weights_)

Grid Search on embedded parameters#

model = MeanRisk(
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    risk_measure=RiskMeasure.VARIANCE,
    prior_estimator=EmpiricalPrior(mu_estimator=EWMu(alpha=0.2)),
)

print(model.get_params(deep=True))

gs = GridSearchCV(
    estimator=model,
    cv=KFold(n_splits=5, shuffle=False),
    n_jobs=-1,
    param_grid={
        "risk_measure": [
            RiskMeasure.VARIANCE,
            RiskMeasure.CVAR,
            RiskMeasure.VARIANCE.CDAR,
        ],
        "prior_estimator__mu_estimator__alpha": [0.05, 0.1, 0.2, 0.5],
    },
)
gs.fit(X)

best_model = gs.best_estimator_

print(best_model.weights_)

Black & Litterman Model#

views = ["AAPL - BBY == 0.03 ", "CVX - KO == 0.04", "MSFT == 0.06 "]
model = MeanRisk(
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    prior_estimator=BlackLitterman(views=views),
)

Factor Model#

factor_prices = load_factors_dataset()

X, y = prices_to_returns(prices, factor_prices)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, shuffle=False)

model = MeanRisk(prior_estimator=FactorModel())
model.fit(X_train, y_train)

print(model.weights_)

portfolio = model.predict(X_test)

print(portfolio.calmar_ratio)
print(portfolio.summary())

Factor Model & Covariance Detoning#

model = MeanRisk(
    prior_estimator=FactorModel(
        factor_prior_estimator=EmpiricalPrior(covariance_estimator=DetoneCovariance())
    )
)

Black & Litterman Factor Model#

factor_views = ["MTUM - QUAL == 0.03 ", "SIZE - TLT == 0.04", "VLUE == 0.06"]
model = MeanRisk(
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    prior_estimator=FactorModel(
        factor_prior_estimator=BlackLitterman(views=factor_views),
    ),
)

Pre-Selection Pipeline#

set_config(transform_output="pandas")
model = Pipeline(
    [
        ("pre_selection", SelectKExtremes(k=10, highest=True)),
        ("optimization", MeanRisk()),
    ]
)
model.fit(X_train)

portfolio = model.predict(X_test)

K-fold Cross-Validation#

model = MeanRisk()
mmp = cross_val_predict(model, X_test, cv=KFold(n_splits=5))
# mmp is the predicted MultiPeriodPortfolio object composed of 5 Portfolios (1 per testing fold)

mmp.plot_cumulative_returns()
print(mmp.summary()

Combinatorial Purged Cross-Validation#

model = MeanRisk()

cv = CombinatorialPurgedCV(n_folds=10, n_test_folds=2)

print(cv.get_summary(X_train))

population = cross_val_predict(model, X_train, cv=cv)

population.plot_distribution(
    measure_list=[RatioMeasure.SHARPE_RATIO, RatioMeasure.SORTINO_RATIO]
)
population.plot_cumulative_returns()
print(population.summary())

Recognition#

We would like to thank all contributors to our direct dependencies, such as scikit-learn and cvxpy, as well as the contributors of the following resources that served as sources of inspiration:

* PyPortfolioOpt
* Riskfolio-Lib
* scikit-portfolio
* microprediction
* statsmodels
* rsome
* gautier.marti.ai

Citation#

If you use skfolio in a scientific publication, we would appreciate citations:

Bibtex entry:

@misc{skfolio,
      author = {Hugo Delatte, Carlo Nicolini},
      title = {skfolio},
      year  = {2023},
      url   = {https://github.com/skfolio/skfolio}