skfolio.prior.SyntheticData#

class skfolio.prior.SyntheticData(distribution_estimator=None, n_samples=1000, sample_args=None)[source]#

Synthetic Data Estimator.

The Synthetic Data model estimates a PriorModel by fitting a distribution_estimator and sampling new returns data from it.

The default distribution_estimator is a Regular Vine Copula model. Other common choices are Generative Adversarial Networks (GANs) or Variational Autoencoders (VAEs).

This class is particularly useful when the historical distribution tail dependencies are sparse and need extrapolation for tail optimizations or when optimizing under conditional or stressed scenarios.

Parameters:
distribution_estimatorBaseEstimator, optional

Estimator to model the distribution of asset returns. It must inherit from BaseEstimator and implements a sample method. If None, the default VineCopula() model is used.

n_samplesint, default=1000

Number of samples to generate from the distribution_estimator, default is 1000.

sample_argsdict, optional

Additional keyword arguments to pass to the sample method of the distribution_estimator.

Attributes:
prior_model_PriorModel

The assets PriorModel.

distribution_estimator_BaseEstimator

The fitted distribution estimator.

n_features_in_int

Number of assets seen during fit.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during fit. Defined only when X has feature names that are all strings.

Examples

>>> import numpy as np
>>> from skfolio.datasets import load_sp500_dataset, load_factors_dataset
>>> from skfolio.preprocessing import prices_to_returns
>>> from skfolio.distribution import VineCopula
>>> from skfolio.optimization import MeanRisk
>>> from skfolio.prior import FactorModel, SyntheticData
>>> from skfolio import RiskMeasure
>>>
>>> # Load historical prices and convert them to returns
>>> prices = load_sp500_dataset()
>>> factors = load_factors_dataset()
>>> X, y = prices_to_returns(prices, factors)
>>>
>>> # Instanciate the SyntheticData model and fit it
>>> model = SyntheticData()
>>> model.fit(X)
>>> print(model.prior_model_)
>>>
>>> # Minimum CVaR optimization on synthetic returns
>>> model = MeanRisk(
...    risk_measure=RiskMeasure.CVAR,
...    prior_estimator=SyntheticData(
...        distribution_estimator=VineCopula(log_transform=True, n_jobs=-1),
...        n_samples=2000,
...    )
... )
>>> model.fit(X)
>>> print(model.weights_)
>>>
>>> # Minimum CVaR optimization on Stressed Factors
>>> factor_model = FactorModel(
...    factor_prior_estimator=SyntheticData(
...        distribution_estimator=VineCopula(
...            central_assets=["QUAL"],
...            log_transform=True,
...            n_jobs=-1,
...        ),
...        n_samples=5000,
...        sample_args=dict(conditioning={"QUAL": -0.2}),
...    )
... )
>>> model = MeanRisk(risk_measure=RiskMeasure.CVAR, prior_estimator=factor_model)
>>> model.fit(X, y)
>>> print(model.weights_)
>>>
>>> # Stress Test the Portfolio
>>> factor_model.set_params(factor_prior_estimator__sample_args=dict(
...     conditioning={"QUAL": -0.5}
... ))
>>> factor_model.fit(X,y)
>>> stressed_X = factor_model.prior_model_.returns
>>> stressed_ptf = model.predict(stressed_X)

Methods

fit(X[, y])

Fit the Synthetic Data estimator.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

set_params(**params)

Set the parameters of this estimator.

fit(X, y=None, **fit_params)[source]#

Fit the Synthetic Data estimator.

Parameters:
Xarray-like of shape (n_observations, n_assets)

Price returns of the assets.

yIgnored

Not used, present for API consistency by convention.

**fit_paramsdict

Parameters to pass to the underlying estimators. Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:
selfSyntheticData

Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.