skfolio.distribution
.BaseUnivariateDist#
- class skfolio.distribution.BaseUnivariateDist(random_state=None)[source]#
Base Univariate Distribution Estimator.
This abstract class serves as a foundation for univariate distribution models based on scipy.
- random_stateint, RandomState instance or None, default=None
Seed or random state to ensure reproducibility.
- Attributes:
fitted_repr
String representation of the fitted univariate distribution.
n_params
Number of model parameters.
Methods
aic
(X)Compute the Akaike Information Criterion (AIC) for the model given data X.
bic
(X)Compute the Bayesian Information Criterion (BIC) for the model given data X.
cdf
(X)Compute the cumulative distribution function (CDF) for the given data.
fit
(X[, y])Fit the univariate distribution model.
Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
plot_pdf
([X, title])Plot the probability density function (PDF).
ppf
(X)Compute the percent point function (inverse of the CDF) for the given
qq_plot
(X[, title])Plot the empirical quantiles of the sample X versus the quantiles of the fitted model.
sample
([n_samples])Generate random samples from the fitted distribution.
score
(X[, y])Compute the total log-likelihood under the model.
Compute the log-likelihood of each sample (log-pdf) under the model.
set_params
(**params)Set the parameters of this estimator.
- aic(X)#
Compute the Akaike Information Criterion (AIC) for the model given data X.
The AIC is defined as:
\[\mathrm{AIC} = -2 \, \log L \;+\; 2 k,\]where
\(\log L\) is the total log-likelihood
\(k\) is the number of parameters in the model
A lower AIC value indicates a better trade-off between model fit and complexity.
- Parameters:
- Xarray-like of shape (n_observations, n_features)
The input data on which to compute the AIC.
- Returns:
- aicfloat
The AIC of the fitted model on the given data.
Notes
In practice, both AIC and BIC measure the trade-off between model fit and complexity, but BIC tends to prefer simpler models for large \(n\) because of the \(\ln(n)\) term.
References
[1]“A new look at the statistical model identification”, Akaike (1974).
- bic(X)#
Compute the Bayesian Information Criterion (BIC) for the model given data X.
The BIC is defined as:
\[\mathrm{BIC} = -2 \, \log L \;+\; k \,\ln(n),\]where
\(\log L\) is the (maximized) total log-likelihood
\(k\) is the number of parameters in the model
\(n\) is the number of observations
A lower BIC value suggests a better fit while imposing a stronger penalty for model complexity than the AIC.
- Parameters:
- Xarray-like of shape (n_observations, n_features)
The input data on which to compute the BIC.
- Returns:
- bicfloat
The BIC of the fitted model on the given data.
Notes
In practice, both AIC and BIC measure the trade-off between model fit and complexity, but BIC tends to prefer simpler models for large \(n\) because of the \(\ln(n)\) term.
References
[1]“Estimating the dimension of a model”, Schwarz, G. (1978).
- cdf(X)[source]#
Compute the cumulative distribution function (CDF) for the given data.
- Parameters:
- Xarray-like of shape (n_observations, 1)
Data points at which to evaluate the CDF.
- Returns:
- cdfndarray of shape (n_observations, 1)
The CDF evaluated at each data point.
- abstract fit(X, y=None)[source]#
Fit the univariate distribution model.
- Parameters:
- Xarray-like of shape (n_observations, 1)
The input data. X must contain a single column.
- yNone
Ignored. Provided for compatibility with scikit-learn’s API.
- Returns:
- selfBaseUnivariateDist
Returns the instance itself.
- property fitted_repr#
String representation of the fitted univariate distribution.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- property n_params#
Number of model parameters.
- plot_pdf(X=None, title=None)[source]#
Plot the probability density function (PDF).
- Parameters:
- Xarray-like of shape (n_samples, 1), optional
If provided, it is used to plot the empirical data KDE for comparison versus the model PDF.
- titlestr, optional
The title for the plot. If not provided, a default title based on the fitted model’s representation is used.
- Returns:
- figgo.Figure
A Plotly figure object containing the PDF plot.
- ppf(X)[source]#
- Compute the percent point function (inverse of the CDF) for the given
probabilities.
- Parameters:
- Xarray-like of shape (n_observations, 1)
Probabilities for which to compute the corresponding quantiles.
- Returns:
- ppfndarray of shape (n_observations, 1)
The quantiles corresponding to the given probabilities.
- qq_plot(X, title=None)[source]#
Plot the empirical quantiles of the sample X versus the quantiles of the fitted model.
- Parameters:
- Xarray-like of shape (n_samples, 1), optional
Used to plot the empirical quantiles for comparison versus the model quantiles.
- titlestr, optional
The title for the plot. If not provided, a default title based on the fitted model’s representation is used.
- Returns:
- figgo.Figure
A Plotly figure object containing the PDF plot.
- sample(n_samples=1)[source]#
Generate random samples from the fitted distribution.
Currently, this is implemented only for gaussian and tophat kernels.
- Parameters:
- n_samplesint, default=1
Number of samples to generate.
- Returns:
- Xarray-like of shape (n_samples, 1)
List of samples.
- score(X, y=None)#
Compute the total log-likelihood under the model.
- Parameters:
- Xarray-like of shape (n_observations, n_features)
An array of data points for which the total log-likelihood is computed.
- yNone
Ignored. Provided for compatibility with scikit-learn’s API.
- Returns:
- logprobfloat
The total log-likelihood (sum of log-pdf values).
- score_samples(X)[source]#
Compute the log-likelihood of each sample (log-pdf) under the model.
- Parameters:
- Xarray-like of shape (n_observations, 1)
An array of points at which to evaluate the log-probability density. The data should be a single feature column.
- Returns:
- densityndarray of shape (n_observations,)
Log-likelihood values for each observation in X.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.