skfolio.preprocessing.CSWinsorizer#
- class skfolio.preprocessing.CSWinsorizer(*, low=0.01, high=0.99)[source]#
Cross-sectional winsorization.
Clips each finite value within an observation to the interval between the
lowandhighpercentiles of that observation’s cross-section.NaNs are treated as missing values. They are ignored when computing cross-sectional percentiles and are preserved in the output.
When
cs_weightsis provided, percentile boundaries are computed on the estimation universe, defined bycs_weights > 0. Assets outside the estimation universe still receive clipped values using those boundaries. For this estimator,cs_weightsis used only to define the estimation universe; percentile estimation itself remains equal-weighted over the selected assets.This transformer is stateless.
- Parameters:
- lowfloat, default=0.01
Lower percentile used for clipping. Must satisfy \(0 \le \text{low} < \text{high} \le 1\).
- highfloat, default=0.99
Upper percentile used for clipping. Must satisfy \(0 \le \text{low} < \text{high} \le 1\).
Methods
fit(X[, y, cs_weights, cs_groups])Fit the transformer.
fit_transform(X[, y, cs_weights, cs_groups])Fit to
Xand return the transformed values.get_feature_names_out([input_features])Get output feature names for transformation.
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
set_fit_request(*[, cs_groups, cs_weights])Configure whether metadata should be requested to be passed to the
fitmethod.set_params(**params)Set the parameters of this estimator.
set_transform_request(*[, cs_groups, cs_weights])Configure whether metadata should be requested to be passed to the
transformmethod.transform(X[, cs_weights, cs_groups])Winsorize each observation to low/high percentiles.
See also
CSTanhShrinkerSmoothly shrinks extreme values.
Examples
>>> import numpy as np >>> from skfolio.preprocessing import CSWinsorizer >>> >>> X = np.array([[1.0, np.nan, 3.0, 4.0], ... [4.0, 3.0, 2.0, 1.0], ... [10.0, 20.0, np.nan, 40.0]]) >>> >>> transformer = CSWinsorizer(low=0.1, high=0.9) >>> transformer.fit_transform(X) array([[ 1.4, nan, 3. , 3.8], [ 3.7, 3. , 2. , 1.3], [12. , 20. , nan, 36. ]]) >>> >>> # Use cs_weights for the estimation universe before computing the clip bounds. >>> cs_weights = np.array([[1.0, 0.0, 1.0, 1.0], ... [1.0, 0.0, 1.0, 1.0], ... [1.0, 1.0, 0.0, 1.0]]) >>> >>> transformer.fit_transform(X, cs_weights=cs_weights) array([[ 1.4, nan, 3. , 3.8], [ 3.6, 3. , 2. , 1.2], [12. , 20. , nan, 36. ]])
- fit(X, y=None, cs_weights=None, cs_groups=None)#
Fit the transformer.
Cross-sectional transformers are stateless and do not learn data-dependent parameters. This method validates the estimator parameters, validates
X, and recordsn_features_in_for scikit-learn compatibility.- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Input matrix where each row is an observation and each column is an asset.
- yIgnored
Not used, present for API consistency by convention.
- cs_weightsarray-like of shape (n_observations, n_assets), optional
Optional cross-sectional weights accepted for API consistency with
transform. They are ignored during fitting.- cs_groupsarray-like of shape (n_observations, n_assets), optional
Optional cross-sectional group labels accepted for API consistency with
transform. They are ignored during fitting.
- Returns:
- selfBaseCSTransformer
Fitted estimator.
- fit_transform(X, y=None, cs_weights=None, cs_groups=None)#
Fit to
Xand return the transformed values.- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Input matrix where each row is an observation and each column is an asset.
- yIgnored
Not used, present for API consistency by convention.
- cs_weightsarray-like of shape (n_observations, n_assets), optional
Optional cross-sectional weights forwarded to
transform.- cs_groupsarray-like of shape (n_observations, n_assets), optional
Optional cross-sectional group labels forwarded to
transform.
- Returns:
- X_newndarray of shape (n_observations, n_assets)
Transformed array.
- get_feature_names_out(input_features=None)#
Get output feature names for transformation.
- Parameters:
- input_featuresarray-like of str or None, default=None
Input features.
If
input_featuresisNone, thenfeature_names_in_is used as feature names in. Iffeature_names_in_is not defined, then the following input feature names are generated:["x0", "x1", ..., "x(n_features_in_ - 1)"].If
input_featuresis an array-like, theninput_featuresmust matchfeature_names_in_iffeature_names_in_is defined.
- Returns:
- feature_names_outndarray of str objects
Same as input features.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_fit_request(*, cs_groups='$UNCHANGED$', cs_weights='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- cs_groupsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
cs_groupsparameter infit.- cs_weightsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
cs_weightsparameter infit.
- Returns:
- selfobject
The updated object.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_transform_request(*, cs_groups='$UNCHANGED$', cs_weights='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- cs_groupsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
cs_groupsparameter intransform.- cs_weightsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
cs_weightsparameter intransform.
- Returns:
- selfobject
The updated object.
- transform(X, cs_weights=None, cs_groups=None)[source]#
Winsorize each observation to low/high percentiles.
- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Input matrix where each row is an observation and each column is an asset. NaNs are allowed and preserved.
- cs_weightsarray-like of shape (n_observations, n_assets), optional
Optional non-negative cross-sectional weights used only to define the estimation universe through the convention
cs_weights > 0. Percentile boundaries are then estimated in an equal-weighted way over the selected assets. Non-estimation assets still receive clipped values using those boundaries. IfNone, all finite assets are used to compute percentiles.- cs_groupsarray-like of shape (n_observations, n_assets), optional
Not used, present for API consistency by convention.
- Returns:
- X_clippedndarray of shape (n_observations, n_assets)
Winsorized values. NaN values from the input are preserved.
- Raises:
- ValueError
If
low/highare invalid,Xis not a non-empty 2D array,cs_weightsis invalid, or any observation has no estimation asset.