skfolio.moments.EWVariance#

class skfolio.moments.EWVariance(half_life=40, assume_centered=True, min_observations=None, window_size=None)[source]#

Exponentially Weighted Variance estimator.

This is the variance-only counterpart of EWCovariance, computing only the diagonal elements (variances) and assuming zero correlation. This is appropriate when:

  • Estimating idiosyncratic (specific) risk in factor models, where residual returns are uncorrelated by construction

  • Working with orthogonalized or uncorrelated return series

  • The full covariance structure is not needed or is constructed separately

This estimator uses the recursive EWMA formula:

\[\sigma^2_{i,t} = \lambda \sigma^2_{i,t-1} + (1-\lambda) r_{i,t}^2\]

where \(\lambda\) is the decay factor, which determines how much weight is given to past observations. It is computed from the half-life parameter:

\[\lambda = 2^{-1/\text{half-life}}\]

The half-life is the number of observations for the weight to decay to 50%.

This estimator supports both batch fitting via fit and incremental updates via partial_fit, making it suitable for online learning scenarios.

NaN handling:

The estimator handles missing data (NaN returns) caused by late listings, delistings, and holidays using EWMA updates together with active_mask. An asset with active_mask=True is treated as active at time \(t\). If its return is finite, the EWMA is updated normally. If its return is NaN, the observation is treated as a holiday and the previous variance is kept. An asset with active_mask=False is treated as inactive, for example during pre-listing or post-delisting periods, and its variance is set to NaN.

  • Active with valid return: Normal EWMA update.

  • Active with NaN return (holiday): Freeze; the previous variance is kept.

  • Inactive (active_mask=False): Variance is set to NaN.

When active_mask is not provided, trailing NaN returns are ambiguous: they could correspond either to holidays, in which case the variance is frozen, or to inactive periods, in which case the variance is set to NaN.

Late-listing bias correction:

When an asset becomes active (late listing), the EWMA recursion is initialized at zero rather than at the first squared return. This zero-initialization introduces a transient downward scale bias: after \(n_i\) valid observations, the raw EWMA weights sum to \((1 - \lambda^{n_i})\) instead of 1. At output time, a per-asset correction removes this bias:

\[\hat{\sigma}^2_i = \frac{S_i}{1 - \lambda^{n_i}}\]

where \(S_i\) is the raw internal EWMA accumulator. For assets with a long history, the correction is negligible (\(\lambda^{n_i} \to 0\)).

The min_observations parameter controls a warm-up period: an asset’s variance estimate remains NaN in the output until it has accumulated enough valid observations for a reliable estimate.

Parameters:
half_lifefloat, default=40

Half-life of the exponential weights in number of observations.

The half-life controls how quickly older observations lose their influence:

  • Larger half-life: More stable estimates, slower to adapt (robust to noise)

  • Smaller half-life: More responsive estimates, faster to adapt (sensitive to noise)

The decay factor \(\lambda\) is computed as: \(\lambda = 2^{-1/\text{half-life}}\)

For example:
  • half-life = 40: \(\lambda \approx 0.983\)

  • half-life = 23: \(\lambda \approx 0.970\)

  • half-life = 11: \(\lambda \approx 0.939\)

  • half-life = 6: \(\lambda \approx 0.891\)

Note

For portfolio optimization, larger half-lives (>= 20) are generally preferred to avoid excessive turnover from estimation noise.

assume_centeredbool, default=True

If True (default), the EWMA update uses raw returns without demeaning. This is the standard convention for EWMA variance estimation in finance. If False, returns are demeaned using an EWMA mean estimate before computing the variance update, and location_ tracks the EWMA mean.

min_observationsint, optional

Minimum number of valid observations per asset before its variance estimate is considered reliable and exposed in the output variance_. Until this threshold is reached, the asset’s variance estimate remains NaN.

The default (None) uses int(half_life) as the threshold, ensuring the late-listing initialization bias has decayed to at most 50%. Set to 1 to disable warm-up entirely.

window_sizeint, optional

Window size to truncate data to the last window_size observations before fitting. Only applies to the initial fit call (or equivalently, the first partial_fit call); subsequent partial_fit calls use all provided data.

This is a computational optimization for very long time series. Due to exponential decay, observations far in the past contribute negligibly to the current estimate. For example, with half-life = 23 (\(\lambda = 0.97\)), observations beyond ~150 periods contribute less than 1% to the estimate. Truncating to a reasonable window (e.g., 252 trading days) speeds up computation without materially affecting results.

The default (None) uses all available data.

Attributes:
variance_ndarray of shape (n_assets,)

Estimated variance vector. Contains NaN for assets that are inactive or that have not yet accumulated min_observations valid observations.

location_ndarray of shape (n_assets,)

Estimated location (mean). If assume_centered=True, this is zeros. Otherwise, it tracks the EWMA mean of returns. Contains NaN for inactive assets when assume_centered=False.

n_features_in_int

Number of assets seen during fit.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during fit. Defined only when X has feature names that are all strings.

Methods

fit(X[, y, active_mask])

Fit the Exponentially Weighted Variance estimator.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

partial_fit(X[, y, active_mask])

Incrementally fit the Exponentially Weighted Variance estimator.

set_fit_request(*[, active_mask])

Configure whether metadata should be requested to be passed to the fit method.

set_params(**params)

Set the parameters of this estimator.

set_partial_fit_request(*[, active_mask])

Configure whether metadata should be requested to be passed to the partial_fit method.

Examples

>>> import numpy as np
>>> from skfolio.datasets import load_sp500_dataset
>>> from skfolio.moments import EWVariance
>>> from skfolio.preprocessing import prices_to_returns
>>>
>>> prices = load_sp500_dataset()
>>> X = prices_to_returns(prices)
>>>
>>> # Batch fitting
>>> model = EWVariance(half_life=40)
>>> model.fit(X)
>>> print(model.variance_.shape)
>>>
>>> # Streaming updates with partial_fit
>>> model2 = EWVariance(half_life=20)
>>> model2.partial_fit(X[:100])  # Initial fit
>>> model2.partial_fit(X[100:200])  # Update with new data
>>> model2.partial_fit(X[200:])  # Continue updating
>>>
>>> # NaN-aware fitting with active_mask
>>> # Asset 2 is listed starting from observation 50
>>> active_mask = np.ones(X.shape, dtype=bool)
>>> active_mask[:50, 2] = False
>>> X_nan = X.copy()
>>> X_nan[:50, 2] = np.nan
>>> model3 = EWVariance(half_life=40)
>>> model3.fit(X_nan, active_mask=active_mask)
fit(X, y=None, *, active_mask=None)[source]#

Fit the Exponentially Weighted Variance estimator.

Parameters:
Xarray-like of shape (n_observations, n_assets)

Price returns of the assets. NaN values are allowed and handled robustly.

yIgnored

Not used, present for API consistency by convention.

active_maskarray-like of shape (n_observations, n_assets), optional

Boolean mask indicating whether each asset is structurally active at each observation. Use this to distinguish between holidays (active_mask=True and NaN return: variance is frozen) and inactive periods such as pre-listing or post-delisting (active_mask=False: variance is set to NaN). If None (default), all assets are assumed active.

Returns:
selfEWVariance

Fitted estimator.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

partial_fit(X, y=None, *, active_mask=None)[source]#

Incrementally fit the Exponentially Weighted Variance estimator.

This method allows for streaming/online updates to the variance estimate. Each call updates the internal state with new observations.

Parameters:
Xarray-like of shape (n_observations, n_assets)

Price returns of the assets. NaN values are allowed and handled robustly.

yIgnored

Not used, present for API consistency by convention.

active_maskarray-like of shape (n_observations, n_assets), optional

Boolean mask indicating whether each asset is structurally active at each observation. See fit for details.

Returns:
selfEWVariance

Fitted estimator.

set_fit_request(*, active_mask='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
active_maskstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for active_mask parameter in fit.

Returns:
selfobject

The updated object.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

set_partial_fit_request(*, active_mask='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the partial_fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to partial_fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to partial_fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
active_maskstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for active_mask parameter in partial_fit.

Returns:
selfobject

The updated object.