skfolio.preprocessing.CSGaussianRankScaler#
- class skfolio.preprocessing.CSGaussianRankScaler(*, min_group_size=8, scale=True, atol=1e-12)[source]#
Cross-sectional rank Gaussianization.
Computes percentile ranks within each cross-section (see
CSPercentileRankScaler), maps them through the inverse standard normal CDF \(\Phi^{-1}\), and recenters to weighted mean zero over the estimation universe. Whenscale=True, the result is also rescaled to unit equal-weighted standard deviation.When
cs_weightsis provided, the estimation universe is defined bycs_weights > 0. Assets outside that universe still receive Gaussianized scores relative to it.cs_weightsis used to define the estimation universe and to compute the final weighted recentering; ranking itself remains equal-weighted over the selected assets.NaNs are treated as missing values. They are ignored when computing cross-sectional ranks and are preserved in the output.
For observation \(t\), the Gaussianized value of asset \(i\) is:
\[z_{t,i} = \frac{\Phi^{-1}(p_{t,i}) - \mu_t}{\sigma_t}\]where \(p_{t,i}\) is the percentile rank, \(\mu_t\) the weighted mean of \(\Phi^{-1}(p_{t,\cdot})\) over the estimation universe, and \(\sigma_t\) its unbiased equal-weighted standard deviation. When
scale=False, the rescaling step is skipped and only weighted recentering is applied.When
cs_groupsis provided, the same scheme is applied within each group. Groups with fewer thanmin_group_sizeestimation assets, and missing groups (cs_groups == -1), fall back to the global cross-section. Recentering and rescaling are always performed over the full cross-section, not within groups.This transformer is stateless.
- Parameters:
- min_group_sizeint, default=8
Minimum number of estimation assets required in a group. Smaller groups fall back to the global cross-section.
- scalebool, default=True
If True, rescale final exposures to unit equal-weighted standard deviation over the estimation universe. If False, only weighted recentering is applied. Use this when feeding the output to a scale-invariant downstream model (e.g. gradient-boosted trees) and you want to avoid injecting per-cross-section noise from the unbiased standard-deviation estimate.
- atolfloat, default=1e-12
Absolute tolerance used to guard against division by a near-zero equal-weighted standard deviation. Must be finite and non-negative.
Methods
fit(X[, y, cs_weights, cs_groups])Fit the transformer.
fit_transform(X[, y, cs_weights, cs_groups])Fit to
Xand return the transformed values.get_feature_names_out([input_features])Get output feature names for transformation.
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
set_fit_request(*[, cs_groups, cs_weights])Configure whether metadata should be requested to be passed to the
fitmethod.set_params(**params)Set the parameters of this estimator.
set_transform_request(*[, cs_groups, cs_weights])Configure whether metadata should be requested to be passed to the
transformmethod.transform(X[, cs_weights, cs_groups])Transform values into cross-sectional Gaussianized exposures.
See also
Examples
>>> import numpy as np >>> from skfolio.preprocessing import CSGaussianRankScaler >>> >>> X = np.array([[1.0, np.nan, 3.0, 4.0], ... [4.0, 3.0, 2.0, 1.0], ... [10.0, 20.0, np.nan, 40.0]]) >>> >>> transformer = CSGaussianRankScaler() >>> transformer.fit_transform(X) array([[-1. , nan, 0. , 1. ], [ 1.180302 , 0.32693605, -0.32693605, -1.180302 ], [-1. , 0. , nan, 1. ]]) >>> >>> # Use cs_weights for the estimation universe and weighted recentering, and rank within groups. >>> cs_weights = np.array([[3.0, 0.0, 1.0, 2.0], ... [4.0, 0.0, 2.0, 3.0], ... [2.0, 3.0, 0.0, 5.0]]) >>> cs_groups = np.array([[0, 0, 1, 1], ... [0, 0, 1, 1], ... [0, 0, 1, 1]]) >>> >>> transformer = CSGaussianRankScaler(min_group_size=2) >>> transformer.fit_transform(X, cs_weights=cs_weights, cs_groups=cs_groups) array([[-0.6791367 , nan, -0.34541391, 1.19141201], [ 0.69857792, 0.0863589 , 0.36442412, -1.17438663], [-1.33305449, 0.13413753, nan, 0.45273928]])
- fit(X, y=None, cs_weights=None, cs_groups=None)#
Fit the transformer.
Cross-sectional transformers are stateless and do not learn data-dependent parameters. This method validates the estimator parameters, validates
X, and recordsn_features_in_for scikit-learn compatibility.- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Input matrix where each row is an observation and each column is an asset.
- yIgnored
Not used, present for API consistency by convention.
- cs_weightsarray-like of shape (n_observations, n_assets), optional
Optional cross-sectional weights accepted for API consistency with
transform. They are ignored during fitting.- cs_groupsarray-like of shape (n_observations, n_assets), optional
Optional cross-sectional group labels accepted for API consistency with
transform. They are ignored during fitting.
- Returns:
- selfBaseCSTransformer
Fitted estimator.
- fit_transform(X, y=None, cs_weights=None, cs_groups=None)#
Fit to
Xand return the transformed values.- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Input matrix where each row is an observation and each column is an asset.
- yIgnored
Not used, present for API consistency by convention.
- cs_weightsarray-like of shape (n_observations, n_assets), optional
Optional cross-sectional weights forwarded to
transform.- cs_groupsarray-like of shape (n_observations, n_assets), optional
Optional cross-sectional group labels forwarded to
transform.
- Returns:
- X_newndarray of shape (n_observations, n_assets)
Transformed array.
- get_feature_names_out(input_features=None)#
Get output feature names for transformation.
- Parameters:
- input_featuresarray-like of str or None, default=None
Input features.
If
input_featuresisNone, thenfeature_names_in_is used as feature names in. Iffeature_names_in_is not defined, then the following input feature names are generated:["x0", "x1", ..., "x(n_features_in_ - 1)"].If
input_featuresis an array-like, theninput_featuresmust matchfeature_names_in_iffeature_names_in_is defined.
- Returns:
- feature_names_outndarray of str objects
Same as input features.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_fit_request(*, cs_groups='$UNCHANGED$', cs_weights='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- cs_groupsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
cs_groupsparameter infit.- cs_weightsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
cs_weightsparameter infit.
- Returns:
- selfobject
The updated object.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_transform_request(*, cs_groups='$UNCHANGED$', cs_weights='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- cs_groupsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
cs_groupsparameter intransform.- cs_weightsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
cs_weightsparameter intransform.
- Returns:
- selfobject
The updated object.
- transform(X, cs_weights=None, cs_groups=None)[source]#
Transform values into cross-sectional Gaussianized exposures.
- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Input matrix where each row is an observation and each column is an asset. NaNs are allowed and preserved.
- cs_weightsarray-like of shape (n_observations, n_assets), optional
Optional non-negative cross-sectional weights. They define the estimation universe through
cs_weights > 0and drive the final weighted recentering. Ranking itself remains equal-weighted over the selected assets. IfNone, all finite assets are included in the estimation universe.- cs_groupsarray-like of shape (n_observations, n_assets), optional
Integer group labels >= -1. Missing groups (
-1) and groups with fewer thanmin_group_sizeestimation assets fall back to the global cross-section. IfNone, ranking is performed on the full cross-section of each observation.
- Returns:
- Zndarray of shape (n_observations, n_assets)
Gaussianized exposures. Each cross-section has weighted mean zero over its estimation universe and, when
scale=True, unit equal-weighted standard deviation. NaNs fromXare preserved.
- Raises:
- ValueError
- If
min_group_sizeis not an integer>= 1,atolis not finite or< 0, Xis not a non-empty 2D array,cs_weightsis invalid,cs_groupsis invalid, or any observation has no estimation asset.
- If