skfolio.moments.EWVariance#
- class skfolio.moments.EWVariance(half_life=40, assume_centered=True, min_observations=None, window_size=None)[source]#
Exponentially Weighted Variance estimator.
This is the variance-only counterpart of
EWCovariance, computing only the diagonal elements (variances) and assuming zero correlation. This is appropriate when:Estimating idiosyncratic (specific) risk in factor models, where residual returns are uncorrelated by construction
Working with orthogonalized or uncorrelated return series
The full covariance structure is not needed or is constructed separately
This estimator uses the recursive EWMA formula:
\[\sigma^2_{i,t} = \lambda \sigma^2_{i,t-1} + (1-\lambda) r_{i,t}^2\]where \(\lambda\) is the decay factor, which determines how much weight is given to past observations. It is computed from the half-life parameter:
\[\lambda = 2^{-1/\text{half-life}}\]The half-life is the number of observations for the weight to decay to 50%.
This estimator supports both batch fitting via
fitand incremental updates viapartial_fit, making it suitable for online learning scenarios.NaN handling:
The estimator handles missing data (NaN returns) caused by late listings, delistings, and holidays using EWMA updates together with
active_mask. An asset withactive_mask=Trueis treated as active at time \(t\). If its return is finite, the EWMA is updated normally. If its return is NaN, the observation is treated as a holiday and the previous variance is kept. An asset withactive_mask=Falseis treated as inactive, for example during pre-listing or post-delisting periods, and its variance is set to NaN.Active with valid return: Normal EWMA update.
Active with NaN return (holiday): Freeze; the previous variance is kept.
Inactive (
active_mask=False): Variance is set to NaN.
When
active_maskis not provided, trailing NaN returns are ambiguous: they could correspond either to holidays, in which case the variance is frozen, or to inactive periods, in which case the variance is set to NaN.Late-listing bias correction:
When an asset becomes active (late listing), the EWMA recursion is initialized at zero rather than at the first squared return. This zero-initialization introduces a transient downward scale bias: after \(n_i\) valid observations, the raw EWMA weights sum to \((1 - \lambda^{n_i})\) instead of 1. At output time, a per-asset correction removes this bias:
\[\hat{\sigma}^2_i = \frac{S_i}{1 - \lambda^{n_i}}\]where \(S_i\) is the raw internal EWMA accumulator. For assets with a long history, the correction is negligible (\(\lambda^{n_i} \to 0\)).
The
min_observationsparameter controls a warm-up period: an asset’s variance estimate remains NaN in the output until it has accumulated enough valid observations for a reliable estimate.- Parameters:
- half_lifefloat, default=40
Half-life of the exponential weights in number of observations.
The half-life controls how quickly older observations lose their influence:
Larger half-life: More stable estimates, slower to adapt (robust to noise)
Smaller half-life: More responsive estimates, faster to adapt (sensitive to noise)
The decay factor \(\lambda\) is computed as: \(\lambda = 2^{-1/\text{half-life}}\)
- For example:
half-life = 40: \(\lambda \approx 0.983\)
half-life = 23: \(\lambda \approx 0.970\)
half-life = 11: \(\lambda \approx 0.939\)
half-life = 6: \(\lambda \approx 0.891\)
Note
For portfolio optimization, larger half-lives (>= 20) are generally preferred to avoid excessive turnover from estimation noise.
- assume_centeredbool, default=True
If True (default), the EWMA update uses raw returns without demeaning. This is the standard convention for EWMA variance estimation in finance. If False, returns are demeaned using an EWMA mean estimate before computing the variance update, and
location_tracks the EWMA mean.- min_observationsint, optional
Minimum number of valid observations per asset before its variance estimate is considered reliable and exposed in the output
variance_. Until this threshold is reached, the asset’s variance estimate remains NaN.The default (
None) usesint(half_life)as the threshold, ensuring the late-listing initialization bias has decayed to at most 50%. Set to 1 to disable warm-up entirely.- window_sizeint, optional
Window size to truncate data to the last
window_sizeobservations before fitting. Only applies to the initialfitcall (or equivalently, the firstpartial_fitcall); subsequentpartial_fitcalls use all provided data.This is a computational optimization for very long time series. Due to exponential decay, observations far in the past contribute negligibly to the current estimate. For example, with half-life = 23 (\(\lambda = 0.97\)), observations beyond ~150 periods contribute less than 1% to the estimate. Truncating to a reasonable window (e.g., 252 trading days) speeds up computation without materially affecting results.
The default (
None) uses all available data.
- Attributes:
- variance_ndarray of shape (n_assets,)
Estimated variance vector. Contains NaN for assets that are inactive or that have not yet accumulated
min_observationsvalid observations.- location_ndarray of shape (n_assets,)
Estimated location (mean). If
assume_centered=True, this is zeros. Otherwise, it tracks the EWMA mean of returns. Contains NaN for inactive assets whenassume_centered=False.- n_features_in_int
Number of assets seen during
fit.- feature_names_in_ndarray of shape (
n_features_in_,) Names of features seen during
fit. Defined only whenXhas feature names that are all strings.
Methods
fit(X[, y, active_mask])Fit the Exponentially Weighted Variance estimator.
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
partial_fit(X[, y, active_mask])Incrementally fit the Exponentially Weighted Variance estimator.
set_fit_request(*[, active_mask])Configure whether metadata should be requested to be passed to the
fitmethod.set_params(**params)Set the parameters of this estimator.
set_partial_fit_request(*[, active_mask])Configure whether metadata should be requested to be passed to the
partial_fitmethod.Examples
>>> import numpy as np >>> from skfolio.datasets import load_sp500_dataset >>> from skfolio.moments import EWVariance >>> from skfolio.preprocessing import prices_to_returns >>> >>> prices = load_sp500_dataset() >>> X = prices_to_returns(prices) >>> >>> # Batch fitting >>> model = EWVariance(half_life=40) >>> model.fit(X) >>> print(model.variance_.shape) >>> >>> # Streaming updates with partial_fit >>> model2 = EWVariance(half_life=20) >>> model2.partial_fit(X[:100]) # Initial fit >>> model2.partial_fit(X[100:200]) # Update with new data >>> model2.partial_fit(X[200:]) # Continue updating >>> >>> # NaN-aware fitting with active_mask >>> # Asset 2 is listed starting from observation 50 >>> active_mask = np.ones(X.shape, dtype=bool) >>> active_mask[:50, 2] = False >>> X_nan = X.copy() >>> X_nan[:50, 2] = np.nan >>> model3 = EWVariance(half_life=40) >>> model3.fit(X_nan, active_mask=active_mask)
- fit(X, y=None, *, active_mask=None)[source]#
Fit the Exponentially Weighted Variance estimator.
- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Price returns of the assets. NaN values are allowed and handled robustly.
- yIgnored
Not used, present for API consistency by convention.
- active_maskarray-like of shape (n_observations, n_assets), optional
Boolean mask indicating whether each asset is structurally active at each observation. Use this to distinguish between holidays (
active_mask=Trueand NaN return: variance is frozen) and inactive periods such as pre-listing or post-delisting (active_mask=False: variance is set to NaN). IfNone(default), all assets are assumed active.
- Returns:
- selfEWVariance
Fitted estimator.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- partial_fit(X, y=None, *, active_mask=None)[source]#
Incrementally fit the Exponentially Weighted Variance estimator.
This method allows for streaming/online updates to the variance estimate. Each call updates the internal state with new observations.
- Parameters:
- Xarray-like of shape (n_observations, n_assets)
Price returns of the assets. NaN values are allowed and handled robustly.
- yIgnored
Not used, present for API consistency by convention.
- active_maskarray-like of shape (n_observations, n_assets), optional
Boolean mask indicating whether each asset is structurally active at each observation. See
fitfor details.
- Returns:
- selfEWVariance
Fitted estimator.
- set_fit_request(*, active_mask='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- active_maskstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
active_maskparameter infit.
- Returns:
- selfobject
The updated object.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_partial_fit_request(*, active_mask='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
partial_fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topartial_fitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topartial_fit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- active_maskstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
active_maskparameter inpartial_fit.
- Returns:
- selfobject
The updated object.