skfolio.distribution.JohnsonSU#

class skfolio.distribution.JohnsonSU(loc=None, scale=None, random_state=None)[source]#

Johnson SU Distribution Estimation.

This estimator fits a univariate Johnson SU distribution to the input data. The Johnson SU distribution is flexible and can capture both skewness and fat tails, making it appropriate for financial time series modeling.

The probability density function is:

\[f(x, a, b) = \frac{b}{\sqrt{x^2 + 1}} \phi(a + b \log(x + \sqrt{x^2 + 1}))\]

where \(x\), \(a\), and \(b\) are real scalars; \(b > 0\). \(\phi\) is the pdf of the normal distribution.

The probability density above is defined in the “standardized” form. To shift and/or scale the distribution use the loc and scale parameters. Specifically, pdf(x, a, b, loc, scale) is equivalent to pdf(y, a, b) / scale with y = (x - loc) / scale.

For more information, you can refer to the scipy documentation

Parameters:
locfloat, optional

If provided, the location parameter is fixed to this value during fitting. Otherwise, it is estimated from the data.

scalefloat, optional

If provided, the scale parameter is fixed to this value during fitting. Otherwise, it is estimated from the data.

random_stateint, RandomState instance or None, default=None

Seed or random state to ensure reproducibility.

Attributes:
a_float

The fitted first shape parameter of the Johnson SU distribution.

b_float

The fitted second shape parameter of the Johnson SU distribution.

loc_float

The fitted location parameter.

scale_float

The fitted scale parameter.

Examples

>>> from skfolio.datasets import load_sp500_index
>>> from skfolio.preprocessing import prices_to_returns
>>> from skfolio.distribution.univariate import JohnsonSU
>>>
>>> # Load historical prices and convert them to returns
>>> prices = load_sp500_index()
>>> X = prices_to_returns(prices)
>>>
>>> # Initialize the estimator.
>>> model = JohnsonSU()
>>>
>>> # Fit the model to the data.
>>> model.fit(X)
>>>
>>> # Display the fitted parameters.
>>> print(model.fitted_repr)
JohnsonSU(0.0742, 1.08, 0.00115, 0.00774)
>>>
>>> # Compute the log-likelihood, total log-likelihood, CDF, PPF, AIC, and BIC
>>> log_likelihood = model.score_samples(X)
>>> score = model.score(X)
>>> cdf = model.cdf(X)
>>> ppf = model.ppf(X)
>>> aic = model.aic(X)
>>> bic = model.bic(X)
>>>
>>> # Generate 5 new samples from the fitted distribution.
>>> samples = model.sample(n_samples=5)
>>>
>>> # Plot the estimated probability density function (PDF).
>>> fig = model.plot_pdf()
>>> fig.show()

Methods

aic(X)

Compute the Akaike Information Criterion (AIC) for the model given data X.

bic(X)

Compute the Bayesian Information Criterion (BIC) for the model given data X.

cdf(X)

Compute the cumulative distribution function (CDF) for the given data.

fit(X[, y])

Fit the univariate Johnson SU distribution model.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

plot_pdf([X, title])

Plot the probability density function (PDF).

ppf(X)

Compute the percent point function (inverse of the CDF) for the given

qq_plot(X[, title])

Plot the empirical quantiles of the sample X versus the quantiles of the fitted model.

sample([n_samples])

Generate random samples from the fitted distribution.

score(X[, y])

Compute the total log-likelihood under the model.

score_samples(X)

Compute the log-likelihood of each sample (log-pdf) under the model.

set_params(**params)

Set the parameters of this estimator.

aic(X)#

Compute the Akaike Information Criterion (AIC) for the model given data X.

The AIC is defined as:

\[\mathrm{AIC} = -2 \, \log L \;+\; 2 k,\]

where

  • \(\log L\) is the total log-likelihood

  • \(k\) is the number of parameters in the model

A lower AIC value indicates a better trade-off between model fit and complexity.

Parameters:
Xarray-like of shape (n_observations, n_features)

The input data on which to compute the AIC.

Returns:
aicfloat

The AIC of the fitted model on the given data.

Notes

In practice, both AIC and BIC measure the trade-off between model fit and complexity, but BIC tends to prefer simpler models for large \(n\) because of the \(\ln(n)\) term.

References

[1]

“A new look at the statistical model identification”, Akaike (1974).

bic(X)#

Compute the Bayesian Information Criterion (BIC) for the model given data X.

The BIC is defined as:

\[\mathrm{BIC} = -2 \, \log L \;+\; k \,\ln(n),\]

where

  • \(\log L\) is the (maximized) total log-likelihood

  • \(k\) is the number of parameters in the model

  • \(n\) is the number of observations

A lower BIC value suggests a better fit while imposing a stronger penalty for model complexity than the AIC.

Parameters:
Xarray-like of shape (n_observations, n_features)

The input data on which to compute the BIC.

Returns:
bicfloat

The BIC of the fitted model on the given data.

Notes

In practice, both AIC and BIC measure the trade-off between model fit and complexity, but BIC tends to prefer simpler models for large \(n\) because of the \(\ln(n)\) term.

References

[1]

“Estimating the dimension of a model”, Schwarz, G. (1978).

cdf(X)#

Compute the cumulative distribution function (CDF) for the given data.

Parameters:
Xarray-like of shape (n_observations, 1)

Data points at which to evaluate the CDF.

Returns:
cdfndarray of shape (n_observations, 1)

The CDF evaluated at each data point.

fit(X, y=None)[source]#

Fit the univariate Johnson SU distribution model.

Parameters:
Xarray-like of shape (n_observations, 1)

The input data. X must contain a single column.

yNone

Ignored. Provided for compatibility with scikit-learn’s API.

Returns:
selfJohnsonSU

Returns the instance itself.

property fitted_repr#

String representation of the fitted univariate distribution.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

property n_params#

Number of model parameters.

plot_pdf(X=None, title=None)#

Plot the probability density function (PDF).

Parameters:
Xarray-like of shape (n_samples, 1), optional

If provided, it is used to plot the empirical data KDE for comparison versus the model PDF.

titlestr, optional

The title for the plot. If not provided, a default title based on the fitted model’s representation is used.

Returns:
figgo.Figure

A Plotly figure object containing the PDF plot.

ppf(X)#
Compute the percent point function (inverse of the CDF) for the given

probabilities.

Parameters:
Xarray-like of shape (n_observations, 1)

Probabilities for which to compute the corresponding quantiles.

Returns:
ppfndarray of shape (n_observations, 1)

The quantiles corresponding to the given probabilities.

qq_plot(X, title=None)#

Plot the empirical quantiles of the sample X versus the quantiles of the fitted model.

Parameters:
Xarray-like of shape (n_samples, 1), optional

Used to plot the empirical quantiles for comparison versus the model quantiles.

titlestr, optional

The title for the plot. If not provided, a default title based on the fitted model’s representation is used.

Returns:
figgo.Figure

A Plotly figure object containing the PDF plot.

sample(n_samples=1)#

Generate random samples from the fitted distribution.

Currently, this is implemented only for gaussian and tophat kernels.

Parameters:
n_samplesint, default=1

Number of samples to generate.

Returns:
Xarray-like of shape (n_samples, 1)

List of samples.

score(X, y=None)#

Compute the total log-likelihood under the model.

Parameters:
Xarray-like of shape (n_observations, n_features)

An array of data points for which the total log-likelihood is computed.

yNone

Ignored. Provided for compatibility with scikit-learn’s API.

Returns:
logprobfloat

The total log-likelihood (sum of log-pdf values).

score_samples(X)#

Compute the log-likelihood of each sample (log-pdf) under the model.

Parameters:
Xarray-like of shape (n_observations, 1)

An array of points at which to evaluate the log-probability density. The data should be a single feature column.

Returns:
densityndarray of shape (n_observations,)

Log-likelihood values for each observation in X.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.