skfolio.distribution.BaseUnivariateDist#

class skfolio.distribution.BaseUnivariateDist(random_state=None)[source]#

Base Univariate Distribution Estimator.

This abstract class serves as a foundation for univariate distribution models based on scipy.

random_stateint, RandomState instance or None, default=None: Seed or random state to ensure reproducibility.

Attributes:

fitted_repr: String representation of the fitted univariate distribution.
n_params: Number of model parameters.

Methods

`aic`(X)	Compute the Akaike Information Criterion (AIC) for the model given data X.
`bic`(X)	Compute the Bayesian Information Criterion (BIC) for the model given data X.
`cdf`(X)	Compute the cumulative distribution function (CDF) for the given data.
`fit`(X[, y])	Fit the univariate distribution model.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`plot_pdf`([X, title])	Plot the probability density function (PDF).
`ppf`(X)	Compute the percent point function (inverse of the CDF) for the given
`qq_plot`(X[, title])	Plot the empirical quantiles of the sample X versus the quantiles of the fitted model.
`sample`([n_samples])	Generate random samples from the fitted distribution.
`score`(X[, y])	Compute the total log-likelihood under the model.
`score_samples`(X)	Compute the log-likelihood of each sample (log-pdf) under the model.
`set_params`(**params)	Set the parameters of this estimator.

aic(X)#

Compute the Akaike Information Criterion (AIC) for the model given data X.

The AIC is defined as:

\[\mathrm{AIC} = -2 \, \log L \;+\; 2 k,\]

where

\(\log L\) is the total log-likelihood
\(k\) is the number of parameters in the model

A lower AIC value indicates a better trade-off between model fit and complexity.

Parameters:

Xarray-like of shape (n_observations, n_features): The input data on which to compute the AIC.

Returns:

aicfloat: The AIC of the fitted model on the given data.

Notes

In practice, both AIC and BIC measure the trade-off between model fit and complexity, but BIC tends to prefer simpler models for large \(n\) because of the \(\ln(n)\) term.

References

[1]

“A new look at the statistical model identification”, Akaike (1974).

bic(X)#

Compute the Bayesian Information Criterion (BIC) for the model given data X.

The BIC is defined as:

\[\mathrm{BIC} = -2 \, \log L \;+\; k \,\ln(n),\]

where

\(\log L\) is the (maximized) total log-likelihood
\(k\) is the number of parameters in the model
\(n\) is the number of observations

A lower BIC value suggests a better fit while imposing a stronger penalty for model complexity than the AIC.

Parameters:

Xarray-like of shape (n_observations, n_features): The input data on which to compute the BIC.

Returns:

bicfloat: The BIC of the fitted model on the given data.

Notes

In practice, both AIC and BIC measure the trade-off between model fit and complexity, but BIC tends to prefer simpler models for large \(n\) because of the \(\ln(n)\) term.

References

[1]

“Estimating the dimension of a model”, Schwarz, G. (1978).

cdf(X)[source]#

Compute the cumulative distribution function (CDF) for the given data.

Parameters:

Xarray-like of shape (n_observations, 1): Data points at which to evaluate the CDF.

Returns:

cdfndarray of shape (n_observations, 1): The CDF evaluated at each data point.

abstract fit(X, y=None)[source]#

Fit the univariate distribution model.

Parameters:

Xarray-like of shape (n_observations, 1): The input data. X must contain a single column.
yNone: Ignored. Provided for compatibility with scikit-learn’s API.

Returns:

selfBaseUnivariateDist: Returns the instance itself.

property fitted_repr#: String representation of the fitted univariate distribution.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

property n_params#: Number of model parameters.

plot_pdf(X=None, title=None)[source]#

Plot the probability density function (PDF).

Parameters:

Xarray-like of shape (n_samples, 1), optional: If provided, it is used to plot the empirical data KDE for comparison versus the model PDF.
titlestr, optional: The title for the plot. If not provided, a default title based on the fitted model’s representation is used.

Returns:

figgo.Figure: A Plotly figure object containing the PDF plot.

ppf(X)[source]#

Compute the percent point function (inverse of the CDF) for the given: probabilities.

Parameters:

Xarray-like of shape (n_observations, 1): Probabilities for which to compute the corresponding quantiles.

Returns:

ppfndarray of shape (n_observations, 1): The quantiles corresponding to the given probabilities.

qq_plot(X, title=None)[source]#

Plot the empirical quantiles of the sample X versus the quantiles of the fitted model.

Parameters:

Xarray-like of shape (n_samples, 1), optional: Used to plot the empirical quantiles for comparison versus the model quantiles.
titlestr, optional: The title for the plot. If not provided, a default title based on the fitted model’s representation is used.

Returns:

figgo.Figure: A Plotly figure object containing the PDF plot.

sample(n_samples=1)[source]#

Generate random samples from the fitted distribution.

Currently, this is implemented only for gaussian and tophat kernels.

Parameters:

n_samplesint, default=1: Number of samples to generate.

Returns:

Xarray-like of shape (n_samples, 1): List of samples.

score(X, y=None)#

Compute the total log-likelihood under the model.

Parameters:

Xarray-like of shape (n_observations, n_features): An array of data points for which the total log-likelihood is computed.
yNone: Ignored. Provided for compatibility with scikit-learn’s API.

Returns:

logprobfloat: The total log-likelihood (sum of log-pdf values).

score_samples(X)[source]#

Compute the log-likelihood of each sample (log-pdf) under the model.

Parameters:

Xarray-like of shape (n_observations, 1): An array of points at which to evaluate the log-probability density. The data should be a single feature column.

Returns:

densityndarray of shape (n_observations,): Log-likelihood values for each observation in X.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.