skfolio.prior.SyntheticData#

class skfolio.prior.SyntheticData(distribution_estimator=None, n_samples=1000, sample_args=None)[source]#

Synthetic Data Estimator.

The Synthetic Data model estimates a ReturnDistribution by fitting a distribution_estimator and sampling new returns data from it.

The default distribution_estimator is a Regular Vine Copula model. Other common choices are Generative Adversarial Networks (GANs) or Variational Autoencoders (VAEs).

This class is particularly useful when the historical distribution tail dependencies are sparse and need extrapolation for tail optimizations or when optimizing under conditional or stressed scenarios.

Parameters:

distribution_estimatorBaseEstimator, optional: Estimator to model the distribution of asset returns. It must inherit from BaseEstimator and implements a sample method. If None, the default VineCopula() model is used.
n_samplesint, default=1000: Number of samples to generate from the distribution_estimator, default is 1000.
sample_argsdict, optional: Additional keyword arguments to pass to the sample method of the distribution_estimator.

Attributes:

return_distribution_ReturnDistribution: Fitted ReturnDistribution to be used by the optimization estimators, containing the assets syntehtic data distribution and moments estimation.
distribution_estimator_BaseEstimator: The fitted distribution estimator.
n_features_in_int: Number of assets seen during fit.
feature_names_in_ndarray of shape (n_features_in_,): Names of features seen during fit. Defined only when X has feature names that are all strings.

Methods

`fit`(X[, y])	Fit the Synthetic Data estimator.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`set_params`(**params)	Set the parameters of this estimator.

Examples

>>> import numpy as np
>>> from skfolio.datasets import load_sp500_dataset, load_factors_dataset
>>> from skfolio.preprocessing import prices_to_returns
>>> from skfolio.distribution import VineCopula
>>> from skfolio.optimization import MeanRisk
>>> from skfolio.prior import FactorModel, SyntheticData
>>> from skfolio import RiskMeasure
>>>
>>> # Load historical prices and convert them to returns
>>> prices = load_sp500_dataset()
>>> factors = load_factors_dataset()
>>> X, y = prices_to_returns(prices, factors)
>>>
>>> # Instanciate the SyntheticData model and fit it
>>> model = SyntheticData()
>>> model.fit(X)
>>> print(model.return_distribution_)
>>>
>>> # Minimum CVaR optimization on synthetic returns
>>> model = MeanRisk(
...    risk_measure=RiskMeasure.CVAR,
...    prior_estimator=SyntheticData(
...        distribution_estimator=VineCopula(log_transform=True, n_jobs=-1),
...        n_samples=2000,
...    )
... )
>>> model.fit(X)
>>> print(model.weights_)
>>>
>>> # Minimum CVaR optimization on Stressed Factors
>>> factor_model = FactorModel(
...    factor_prior_estimator=SyntheticData(
...        distribution_estimator=VineCopula(
...            central_assets=["QUAL"],
...            log_transform=True,
...            n_jobs=-1,
...        ),
...        n_samples=5000,
...        sample_args=dict(conditioning={"QUAL": -0.2}),
...    )
... )
>>> model = MeanRisk(risk_measure=RiskMeasure.CVAR, prior_estimator=factor_model)
>>> model.fit(X, y)
>>> print(model.weights_)
>>>
>>> # Stress Test the Portfolio
>>> factor_model.set_params(factor_prior_estimator__sample_args=dict(
...     conditioning={"QUAL": -0.5}
... ))
>>> factor_model.fit(X,y)
>>> stressed_dist = factor_model.return_distribution_
>>> stressed_ptf = model.predict(stressed_dist)

fit(X, y=None, **fit_params)[source]#

Fit the Synthetic Data estimator.

Parameters:

Xarray-like of shape (n_observations, n_assets): Price returns of the assets.
yIgnored: Not used, present for API consistency by convention.
**fit_paramsdict: Parameters to pass to the underlying estimators. Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:

selfSyntheticData: Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.