skfolio.preprocessing.CSPercentileRankScaler#
- class skfolio.preprocessing.CSPercentileRankScaler(*, min_group_size=8)[source]#
Cross-sectional percentile rank.
Computes the percentile rank of each finite value within an observation’s cross-section.
When
cs_weightsis provided, percentile ranks are estimated only on the estimation universe, defined bycs_weights > 0. Assets outside that universe still receive percentile ranks relative to it. For this estimator,cs_weightsis used only to define the estimation universe; percentile estimation itself remains equal-weighted over the selected assets.NaNs are treated as missing values. They are ignored when computing cross-sectional ranks and are preserved in the output.
When
cs_groupsisNone, ranks are computed globally within each observation using the formula:\[p_{t,i} = \frac{r_{t,i} - 0.5}{N_{\mathcal{E}_t}}\]where \(\mathcal{E}_t\) is the estimation universe at observation \(t\), \(N_{\mathcal{E}_t}\) its size, and \(r_{t,i} \in [1, N_{\mathcal{E}_t}]\) is the rank of asset \(i\) within that universe. Tied values share the average of the ranks they would otherwise occupy (equivalent to
scipy.stats.rankdata(method="average")).The \(-0.5\) shift centers the rank inside its bin, so percentiles sit strictly in \((0, 1)\), on the closed interval \([0.5 / N_{\mathcal{E}_t},\, 1 - 0.5 / N_{\mathcal{E}_t}]\). This keeps downstream inverse-normal mappings always finite.
When
cs_groupsis provided, the same ranking scheme is applied within each group. Groups with fewer thanmin_group_sizeestimation assets, and missing groups (cs_groups == -1), fall back to the global cross-section.This transformer is stateless.
- Parameters:
- min_group_sizeint, default=8
Minimum number of estimation assets required in a group. Smaller groups fall back to the global cross-section.
Methods
fit(X[, y, cs_weights, cs_groups])Fit the transformer.
fit_transform(X[, y, cs_weights, cs_groups])Fit to
Xand return the transformed values.get_feature_names_out([input_features])Get output feature names for transformation.
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
set_fit_request(*[, cs_groups, cs_weights])Configure whether metadata should be requested to be passed to the
fitmethod.set_params(**params)Set the parameters of this estimator.
set_transform_request(*[, cs_groups, cs_weights])Configure whether metadata should be requested to be passed to the
transformmethod.transform(X[, cs_weights, cs_groups])Transform values into cross-sectional percentile ranks.
See also
Examples
>>> import numpy as np >>> from skfolio.preprocessing import CSPercentileRankScaler >>> >>> X = np.array([[1.0, np.nan, 3.0, 4.0], ... [4.0, 3.0, 2.0, 1.0], ... [10.0, 20.0, np.nan, 40.0]]) >>> >>> transformer = CSPercentileRankScaler() >>> transformer.fit_transform(X) array([[0.16666667, nan, 0.5 , 0.83333333], [0.875 , 0.625 , 0.375 , 0.125 ], [0.16666667, 0.5 , nan, 0.83333333]]) >>> >>> # Restrict the estimation universe with cs_weights and rank within groups. >>> cs_weights = np.array([[1.0, 0.0, 1.0, 1.0], ... [1.0, 0.0, 1.0, 1.0], ... [1.0, 1.0, 0.0, 1.0]]) >>> cs_groups = np.array([[0, 0, 1, 1], ... [0, 0, 1, 1], ... [0, 0, 1, 1]]) >>> >>> transformer = CSPercentileRankScaler(min_group_size=2) >>> transformer.fit_transform(X, cs_weights=cs_weights, cs_groups=cs_groups) array([[0.16666667, nan, 0.25 , 0.75 ], [0.83333333, 0.66666667, 0.75 , 0.25 ], [0.25 , 0.75 , nan, 0.83333333]])
- fit(X, y=None, cs_weights=None, cs_groups=None)#
Fit the transformer.
Cross-sectional transformers are stateless and do not learn data-dependent parameters. This method validates the estimator parameters, validates
X, and recordsn_features_in_for scikit-learn compatibility.- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Input matrix where each row is an observation and each column is an asset.
- yIgnored
Not used, present for API consistency by convention.
- cs_weightsarray-like of shape (n_observations, n_assets), optional
Optional cross-sectional weights accepted for API consistency with
transform. They are ignored during fitting.- cs_groupsarray-like of shape (n_observations, n_assets), optional
Optional cross-sectional group labels accepted for API consistency with
transform. They are ignored during fitting.
- Returns:
- selfBaseCSTransformer
Fitted estimator.
- fit_transform(X, y=None, cs_weights=None, cs_groups=None)#
Fit to
Xand return the transformed values.- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Input matrix where each row is an observation and each column is an asset.
- yIgnored
Not used, present for API consistency by convention.
- cs_weightsarray-like of shape (n_observations, n_assets), optional
Optional cross-sectional weights forwarded to
transform.- cs_groupsarray-like of shape (n_observations, n_assets), optional
Optional cross-sectional group labels forwarded to
transform.
- Returns:
- X_newndarray of shape (n_observations, n_assets)
Transformed array.
- get_feature_names_out(input_features=None)#
Get output feature names for transformation.
- Parameters:
- input_featuresarray-like of str or None, default=None
Input features.
If
input_featuresisNone, thenfeature_names_in_is used as feature names in. Iffeature_names_in_is not defined, then the following input feature names are generated:["x0", "x1", ..., "x(n_features_in_ - 1)"].If
input_featuresis an array-like, theninput_featuresmust matchfeature_names_in_iffeature_names_in_is defined.
- Returns:
- feature_names_outndarray of str objects
Same as input features.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_fit_request(*, cs_groups='$UNCHANGED$', cs_weights='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- cs_groupsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
cs_groupsparameter infit.- cs_weightsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
cs_weightsparameter infit.
- Returns:
- selfobject
The updated object.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_transform_request(*, cs_groups='$UNCHANGED$', cs_weights='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- cs_groupsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
cs_groupsparameter intransform.- cs_weightsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
cs_weightsparameter intransform.
- Returns:
- selfobject
The updated object.
- transform(X, cs_weights=None, cs_groups=None)[source]#
Transform values into cross-sectional percentile ranks.
- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Input matrix where each row is an observation and each column is an asset. NaNs are allowed and preserved.
- cs_weightsarray-like of shape (n_observations, n_assets), optional
Optional non-negative cross-sectional weights used only to define the estimation universe through the convention
cs_weights > 0. Percentile ranks are then estimated in an equal-weighted way over the selected assets. Non-estimation assets still receive percentile ranks relative to that universe. IfNone, all finite assets are included in the estimation universe.- cs_groupsarray-like of shape (n_observations, n_assets), optional
Integer group labels >= -1. Missing groups (
-1) and groups with fewer thanmin_group_sizeestimation assets fall back to the global cross-section. IfNone, ranking is performed globally within each observation.
- Returns:
- Pndarray of shape (n_observations, n_assets)
Percentile ranks in \([0.5 / N_{\mathcal{E}_t},\, 1 - 0.5 / N_{\mathcal{E}_t}]\), where \(N_{\mathcal{E}_t}\) is the size of the cross-section or group fallback used at observation \(t\). NaNs from
Xare preserved.
- Raises:
- ValueError
If
min_group_sizeis not an integer>= 1,Xis not a non-empty 2D array,cs_weightsis invalid,cs_groupsis invalid, or any observation has no estimation asset.