skfolio.preprocessing.CSTanhShrinker#
- class skfolio.preprocessing.CSTanhShrinker(*, knee=3.0, atol=1e-12)[source]#
Cross-sectional tanh outlier shrinker.
Smoothly shrinks extreme values within an observation toward the cross-sectional center while preserving the original scale and units of the input values. Values near the center are left nearly unchanged, while extreme values are compressed inward.
NaNs are treated as missing values. They are ignored when computing the cross-sectional median and MAD and are preserved in the output.
Compared to winsorization (
CSWinsorizer):No hard threshold. The mapping is smooth, so small data changes do not create discontinuous jumps at a clipping boundary.
Strict monotonicity. Tail ordering is preserved because distinct inputs remain distinct after transformation.
Smooth transformed values. This can lead to better-conditioned cross-sectional regressions and more stable coefficient estimates in downstream models.
For observation \(t\) with cross-section \(\mathbf{x}_t\), the transformation is
\[x_{t,i}' = m_t + h_t \cdot \tanh\!\left(\frac{x_{t,i} - m_t}{h_t}\right), \quad h_t = c \cdot s_t\]where \(m_t = \operatorname{median}(\mathbf{x}_t)\), \(s_t = 1.4826 \cdot \operatorname{MAD}(\mathbf{x}_t)\) is a robust scale estimator consistent for the standard deviation under normality, and \(c\) is the knee parameter (see
knee). The quantity \(h_t = c \cdot s_t\) is the half-width of the near-linear region for observation \(t\).When
cs_weightsis provided, median and MAD are computed from the estimation universe, defined bycs_weights > 0. Assets outside the estimation universe still receive shrunk values using those statistics. For this estimator,cs_weightsis used only to define the estimation universe; the median and MAD remain equal-weighted over the selected assets.This transformer is stateless.
- Parameters:
- kneefloat, default=3.0
Compression knee in robust standard deviations. It controls the width of the near-linear region around the median. Larger values reduce shrinkage. Must be finite and strictly positive.
- atolfloat, default=1e-12
Absolute tolerance for the robust scale. If \(s\) is below
atol, the observation is returned unchanged. Must be finite and non-negative.
Methods
fit(X[, y, cs_weights, cs_groups])Fit the transformer.
fit_transform(X[, y, cs_weights, cs_groups])Fit to
Xand return the transformed values.get_feature_names_out([input_features])Get output feature names for transformation.
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
set_fit_request(*[, cs_groups, cs_weights])Configure whether metadata should be requested to be passed to the
fitmethod.set_params(**params)Set the parameters of this estimator.
set_transform_request(*[, cs_groups, cs_weights])Configure whether metadata should be requested to be passed to the
transformmethod.transform(X[, cs_weights, cs_groups])Shrink outliers within each observation using a tanh mapping.
See also
CSWinsorizerHard percentile-based clipping.
Examples
>>> import numpy as np >>> from skfolio.preprocessing import CSTanhShrinker >>> >>> X = np.array([[1.0, np.nan, 3.0, 4.0], ... [4.0, 3.0, 2.0, 1.0], ... [10.0, 20.0, np.nan, 40.0]]) >>> >>> transformer = CSTanhShrinker() >>> transformer.fit_transform(X) array([[ 1.12471866, nan, 3. , 3.98348436], [ 3.94560619, 2.99790441, 2.00209559, 1.05439381], [10.16515641, 20. , nan, 38.75281341]]) >>> >>> # Use cs_weights for the estimation universe before computing the median and MAD. >>> cs_weights = np.array([[1.0, 0.0, 1.0, 1.0], ... [1.0, 0.0, 1.0, 1.0], ... [1.0, 1.0, 0.0, 1.0]]) >>> >>> transformer.fit_transform(X, cs_weights=cs_weights) array([[ 1.12471866, nan, 3. , 3.98348436], [ 3.87528134, 2.98348436, 2. , 1.01651564], [10.16515641, 20. , nan, 38.75281341]])
- fit(X, y=None, cs_weights=None, cs_groups=None)#
Fit the transformer.
Cross-sectional transformers are stateless and do not learn data-dependent parameters. This method validates the estimator parameters, validates
X, and recordsn_features_in_for scikit-learn compatibility.- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Input matrix where each row is an observation and each column is an asset.
- yIgnored
Not used, present for API consistency by convention.
- cs_weightsarray-like of shape (n_observations, n_assets), optional
Optional cross-sectional weights accepted for API consistency with
transform. They are ignored during fitting.- cs_groupsarray-like of shape (n_observations, n_assets), optional
Optional cross-sectional group labels accepted for API consistency with
transform. They are ignored during fitting.
- Returns:
- selfBaseCSTransformer
Fitted estimator.
- fit_transform(X, y=None, cs_weights=None, cs_groups=None)#
Fit to
Xand return the transformed values.- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Input matrix where each row is an observation and each column is an asset.
- yIgnored
Not used, present for API consistency by convention.
- cs_weightsarray-like of shape (n_observations, n_assets), optional
Optional cross-sectional weights forwarded to
transform.- cs_groupsarray-like of shape (n_observations, n_assets), optional
Optional cross-sectional group labels forwarded to
transform.
- Returns:
- X_newndarray of shape (n_observations, n_assets)
Transformed array.
- get_feature_names_out(input_features=None)#
Get output feature names for transformation.
- Parameters:
- input_featuresarray-like of str or None, default=None
Input features.
If
input_featuresisNone, thenfeature_names_in_is used as feature names in. Iffeature_names_in_is not defined, then the following input feature names are generated:["x0", "x1", ..., "x(n_features_in_ - 1)"].If
input_featuresis an array-like, theninput_featuresmust matchfeature_names_in_iffeature_names_in_is defined.
- Returns:
- feature_names_outndarray of str objects
Same as input features.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_fit_request(*, cs_groups='$UNCHANGED$', cs_weights='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- cs_groupsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
cs_groupsparameter infit.- cs_weightsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
cs_weightsparameter infit.
- Returns:
- selfobject
The updated object.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_transform_request(*, cs_groups='$UNCHANGED$', cs_weights='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- cs_groupsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
cs_groupsparameter intransform.- cs_weightsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
cs_weightsparameter intransform.
- Returns:
- selfobject
The updated object.
- transform(X, cs_weights=None, cs_groups=None)[source]#
Shrink outliers within each observation using a tanh mapping.
- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Input matrix where each row is an observation and each column is an asset. NaNs are allowed and preserved.
- cs_weightsarray-like of shape (n_observations, n_assets), optional
Optional non-negative cross-sectional weights used only to define the estimation universe through the convention
cs_weights > 0. The median and MAD are then computed in an equal-weighted way over the selected assets. Non-estimation assets still receive shrunk values using those statistics. IfNone, all finite assets are used.- cs_groupsarray-like of shape (n_observations, n_assets), optional
Not used, present for API consistency by convention.
- Returns:
- X_shrunkndarray of shape (n_observations, n_assets)
Shrunk values in the original scale. NaN values from the input are preserved.
- Raises:
- ValueError
If
kneeis not finite or<= 0,atolis not finite or< 0,Xis not a non-empty 2D array,cs_weightsis invalid, or any observation has no estimation asset.