skfolio.preprocessing.CSStandardScaler#
- class skfolio.preprocessing.CSStandardScaler(*, min_group_size=8, atol=1e-12)[source]#
Cross-sectional standardization.
Standardizes each finite value within an observation’s cross-section to have weighted mean zero and unit equal-weighted standard deviation over the estimation universe.
When
cs_weightsis provided, weighted means and unbiased equal-weighted standard deviations are estimated only on the estimation universe, defined bycs_weights > 0. Assets outside that universe still receive standardized values relative to the estimation universe. For this estimator,cs_weightsis used to define the estimation universe and to compute the cross-sectional mean, while the standard deviation remains equal-weighted over the selected assets.NaNs are treated as missing values. They are ignored when computing cross-sectional statistics and are preserved in the output.
When
cs_groupsisNone, standardization is performed globally within each observation. For observation \(t\), the standardized value \(z_{t,i}\) is defined by:\[z_{t,i} = \frac{x_{t,i} - \mu_t}{\sigma_t}\]where \(\mu_t\) is the weighted mean, \(\sigma_t\) is the unbiased equal-weighted standard deviation, \(\mathcal{E}_t\) is the estimation universe, and \(N_{\mathcal{E}_t}\) is its number of assets:
\[\mu_t = \frac{\sum_{i \in \mathcal{E}_t} w_{t,i} x_{t,i}} {\sum_{i \in \mathcal{E}_t} w_{t,i}}, \quad \sigma_t = \sqrt{\frac{1}{N_{\mathcal{E}_t} - 1} \sum_{i \in \mathcal{E}_t} (x_{t,i} - \mu_t)^2}\]When
cs_groupsis provided, the same centering and scaling scheme is first applied within each group. Groups with fewer thanmin_group_sizeestimation assets, and missing groups (cs_groups == -1), fall back to global cross-sectional statistics. The grouped result is then globally recentered to weighted mean zero and globally rescaled to unit equal-weighted standard deviation over the estimation universe.This transformer is stateless.
- Parameters:
- min_group_sizeint, default=8
Minimum number of estimation assets required in a group. Smaller groups fall back to global cross-sectional statistics.
- atolfloat, default=1e-12
Absolute tolerance below which the cross-sectional standard deviation is treated as zero. When
cs_groupsisNone, this means that the observation has no measurable cross-sectional dispersion on its estimation universe, so finite outputs are set to zero rather thanNaNand the row is treated as a neutral exposure. Whencs_groupsis provided, the same convention applies to the within-group standardization step and to the final global rescaling step.
Methods
fit(X[, y, cs_weights, cs_groups])Fit the transformer.
fit_transform(X[, y, cs_weights, cs_groups])Fit to
Xand return the transformed values.get_feature_names_out([input_features])Get output feature names for transformation.
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
set_fit_request(*[, cs_groups, cs_weights])Configure whether metadata should be requested to be passed to the
fitmethod.set_params(**params)Set the parameters of this estimator.
set_transform_request(*[, cs_groups, cs_weights])Configure whether metadata should be requested to be passed to the
transformmethod.transform(X[, cs_weights, cs_groups])Standardize each observation into cross-sectional z-scores.
Examples
>>> import numpy as np >>> from skfolio.preprocessing import CSStandardScaler >>> >>> X = np.array([[1.0, np.nan, 3.0, 4.0], ... [4.0, 3.0, 2.0, 1.0], ... [10.0, 20.0, np.nan, 40.0]]) >>> >>> transformer = CSStandardScaler() >>> transformer.fit_transform(X) array([[-1.09108945, nan, 0.21821789, 0.87287156], [ 1.161895 , 0.38729833, -0.38729833, -1.161895 ], [-0.87287156, -0.21821789, nan, 1.09108945]]) >>> >>> # Use cs_weights for the estimation universe and weighted means, then standardize within groups. >>> cs_weights = np.array([[3.0, 0.0, 1.0, 2.0], ... [4.0, 0.0, 2.0, 3.0], ... [2.0, 3.0, 0.0, 5.0]]) >>> cs_groups = np.array([[0, 0, 1, 1], ... [0, 0, 1, 1], ... [0, 0, 1, 1]]) >>> >>> transformer = CSStandardScaler(min_group_size=2) >>> transformer.fit_transform(X, cs_weights=cs_weights, cs_groups=cs_groups) array([[-0.55454325, nan, -0.62182063, 1.1427252 ], [ 0.62254586, -0.15324206, 0.5035012 , -1.16572861], [-1.33736075, 0.20821245, nan, 0.41001683]])
- fit(X, y=None, cs_weights=None, cs_groups=None)#
Fit the transformer.
Cross-sectional transformers are stateless and do not learn data-dependent parameters. This method validates the estimator parameters, validates
X, and recordsn_features_in_for scikit-learn compatibility.- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Input matrix where each row is an observation and each column is an asset.
- yIgnored
Not used, present for API consistency by convention.
- cs_weightsarray-like of shape (n_observations, n_assets), optional
Optional cross-sectional weights accepted for API consistency with
transform. They are ignored during fitting.- cs_groupsarray-like of shape (n_observations, n_assets), optional
Optional cross-sectional group labels accepted for API consistency with
transform. They are ignored during fitting.
- Returns:
- selfBaseCSTransformer
Fitted estimator.
- fit_transform(X, y=None, cs_weights=None, cs_groups=None)#
Fit to
Xand return the transformed values.- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Input matrix where each row is an observation and each column is an asset.
- yIgnored
Not used, present for API consistency by convention.
- cs_weightsarray-like of shape (n_observations, n_assets), optional
Optional cross-sectional weights forwarded to
transform.- cs_groupsarray-like of shape (n_observations, n_assets), optional
Optional cross-sectional group labels forwarded to
transform.
- Returns:
- X_newndarray of shape (n_observations, n_assets)
Transformed array.
- get_feature_names_out(input_features=None)#
Get output feature names for transformation.
- Parameters:
- input_featuresarray-like of str or None, default=None
Input features.
If
input_featuresisNone, thenfeature_names_in_is used as feature names in. Iffeature_names_in_is not defined, then the following input feature names are generated:["x0", "x1", ..., "x(n_features_in_ - 1)"].If
input_featuresis an array-like, theninput_featuresmust matchfeature_names_in_iffeature_names_in_is defined.
- Returns:
- feature_names_outndarray of str objects
Same as input features.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_fit_request(*, cs_groups='$UNCHANGED$', cs_weights='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- cs_groupsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
cs_groupsparameter infit.- cs_weightsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
cs_weightsparameter infit.
- Returns:
- selfobject
The updated object.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_transform_request(*, cs_groups='$UNCHANGED$', cs_weights='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- cs_groupsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
cs_groupsparameter intransform.- cs_weightsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
cs_weightsparameter intransform.
- Returns:
- selfobject
The updated object.
- transform(X, cs_weights=None, cs_groups=None)[source]#
Standardize each observation into cross-sectional z-scores.
- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Input matrix where each row represents an observation and each column represents an asset. NaNs are allowed and preserved.
- cs_weightsarray-like of shape (n_observations, n_assets), optional
Optional non-negative cross-sectional weights. Positive weights define the estimation universe and are used to compute weighted means. The standard deviation remains equal-weighted over the selected assets. If
None, all finite assets are included in the estimation universe with unit weight.- cs_groupsarray-like of shape (n_observations, n_assets), optional
Integer group labels >= -1. Missing groups (
-1) and groups with fewer thanmin_group_sizeestimation assets fall back to global cross-sectional statistics. IfNone, standardization is performed globally within each observation.
- Returns:
- Zndarray of shape (n_observations, n_assets)
Standardized values with weighted mean zero and unit equal-weighted standard deviation over the estimation universe.
- Raises:
- ValueError
If
min_group_size < 1,atol < 0,Xis not a non-empty 2D array,cs_weightsis invalid,cs_groupsis invalid, or any observation has no estimation asset.