skfolio.moments.EWMu#

class skfolio.moments.EWMu(half_life=None, min_observations=None, window_size=None, alpha=None)[source]#

Exponentially Weighted Expected Returns (Mu) estimator.

This estimator uses the recursive EWMA formula:

\[\mu_t = \lambda \mu_{t-1} + (1-\lambda) r_t\]

where \(\lambda\) is the decay factor, which determines how much weight is given to past observations. It is computed from the half-life parameter:

\[\lambda = 2^{-1/\text{half-life}}\]

The half-life is the number of observations for the weight to decay to 50%.

This estimator supports both batch fitting via fit and incremental updates via partial_fit, making it suitable for online learning scenarios.

NaN handling:

The estimator handles missing data (NaN returns) caused by late listings, delistings, and holidays using EWMA updates together with active_mask. An asset with active_mask=True is treated as active at time \(t\). If its return is finite, the EWMA is updated normally. If its return is NaN, the observation is treated as a holiday and the previous estimate is kept. An asset with active_mask=False is treated as inactive, for example during pre-listing or post-delisting periods, and its estimate is set to NaN.

  • Active with valid return: Normal EWMA update.

  • Active with NaN return (holiday): Freeze; the previous estimate is kept.

  • Inactive (active_mask=False): The estimate is set to NaN.

When active_mask is not provided, trailing NaN returns are ambiguous: they could correspond either to holidays, in which case the mean is frozen, or to inactive periods, in which case the mean is set to NaN.

Late-listing bias correction:

When an asset becomes active (late listing), the EWMA recursion is initialized at zero rather than at the first return. This zero-initialization introduces a transient downward bias: after \(n_i\) valid observations, the raw EWMA weights sum to \((1 - \lambda^{n_i})\) instead of 1. At output time, a per-asset correction removes this bias:

\[\hat{\mu}_i = \frac{S_i}{1 - \lambda^{n_i}}\]

where \(S_i\) is the raw internal EWMA accumulator. For assets with a long history, the correction is negligible (\(\lambda^{n_i} \to 0\)).

The min_observations parameter controls a warm-up period: an asset’s mean estimate remains NaN in the output until it has accumulated enough valid observations for a reliable estimate.

Parameters:
half_lifefloat, default=40

Half-life of the exponential weights in number of observations.

The half-life controls how quickly older observations lose their influence:

  • Larger half-life: More stable estimates, slower to adapt (robust to noise)

  • Smaller half-life: More responsive estimates, faster to adapt (sensitive to noise)

The decay factor \(\lambda\) is computed as: \(\lambda = 2^{-1/\text{half-life}}\)

For example:
  • half-life = 40: \(\lambda \approx 0.983\)

  • half-life = 23: \(\lambda \approx 0.970\)

  • half-life = 11: \(\lambda \approx 0.939\)

  • half-life = 6: \(\lambda \approx 0.891\)

Note

For portfolio optimization, larger half-lives (>= 20) are generally preferred to avoid excessive turnover from estimation noise.

min_observationsint, optional

Minimum number of valid observations per asset before its mean estimate is considered reliable and exposed in the output mu_. Until this threshold is reached, the asset’s mean estimate remains NaN.

The default (None) uses int(half_life) as the threshold, ensuring the late-listing initialization bias has decayed to at most 50%. Set to 1 to disable warm-up entirely.

window_sizeint, optional

Window size to truncate data to the last window_size observations before fitting. Only applies to the initial fit call (or equivalently, the first partial_fit call); subsequent partial_fit calls use all provided data.

This is a computational optimization for very long time series. Due to exponential decay, observations far in the past contribute negligibly to the current estimate. For example, with half-life = 23 (\(\lambda = 0.97\)), observations beyond ~150 periods contribute less than 1% to the estimate. Truncating to a reasonable window (e.g., 252 trading days) speeds up computation without materially affecting results.

The default (None) uses all available data.

alphafloat, optional

Deprecated since version 0.17.0: alpha is deprecated and will be removed in a future version. Use half_life instead. Note: alpha = 1 - decay_factor and half_life = -ln(2) / ln(1 - alpha).

Attributes:
mu_ndarray of shape (n_assets,)

Estimated expected returns of the assets. Contains NaN for assets that are inactive or have not yet accumulated min_observations valid observations.

n_features_in_int

Number of assets seen during fit.

feature_names_in_ndarray of shape (n_features_in_,)

Names of assets seen during fit. Defined only when X has assets names that are all strings.

Methods

fit(X[, y, active_mask])

Fit the EWMu estimator model.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

partial_fit(X[, y, active_mask])

Incrementally fit the EWMu estimator.

set_fit_request(*[, active_mask])

Configure whether metadata should be requested to be passed to the fit method.

set_params(**params)

Set the parameters of this estimator.

set_partial_fit_request(*[, active_mask])

Configure whether metadata should be requested to be passed to the partial_fit method.

See also

Online Evaluation of Portfolio Optimization

Online evaluation of portfolio optimization using MeanRisk with EWMu and exponentially weighted covariance estimators.

Examples

>>> from skfolio.datasets import load_sp500_dataset
>>> from skfolio.moments import EWMu
>>> from skfolio.preprocessing import prices_to_returns
>>>
>>> prices = load_sp500_dataset()
>>> X = prices_to_returns(prices)
>>>
>>> # Batch fitting
>>> model = EWMu(half_life=40)
>>> model.fit(X)
>>> print(model.mu_.shape)
>>>
>>> # Streaming updates with partial_fit
>>> model2 = EWMu(half_life=20)
>>> model2.partial_fit(X[:100])  # Initial fit
>>> model2.partial_fit(X[100:200])  # Update with new data
>>> model2.partial_fit(X[200:])  # Continue updating
>>>
>>> # NaN-aware fitting with active_mask
>>> import numpy as np
>>> # Asset 2 is listed starting from observation 50
>>> active_mask = np.ones(X.shape, dtype=bool)
>>> active_mask[:50, 2] = False
>>> X_nan = X.copy()
>>> X_nan[:50, 2] = np.nan
>>> model3 = EWMu(half_life=40)
>>> model3.fit(X_nan, active_mask=active_mask)
fit(X, y=None, *, active_mask=None)[source]#

Fit the EWMu estimator model.

Parameters:
Xarray-like of shape (n_observations, n_assets)

Price returns of the assets. May contain NaN for missing data (holidays, late listings, delistings).

yIgnored

Not used, present for API consistency by convention.

active_maskarray-like of shape (n_observations, n_assets), optional

Boolean mask indicating whether each asset is structurally active at each observation. Use this to distinguish between holidays (active_mask=True and NaN return: mean is frozen) and inactive periods such as pre-listing or post-delisting (active_mask=False: mean is set to NaN). If None (default), all assets are assumed active.

Returns:
selfEWMu

Fitted estimator.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

partial_fit(X, y=None, *, active_mask=None)[source]#

Incrementally fit the EWMu estimator.

This method allows for streaming/online updates to the expected returns estimate. Each call updates the internal state with new observations.

Parameters:
Xarray-like of shape (n_observations, n_assets)

Price returns of the assets. May contain NaN for missing data (holidays, late listings, delistings).

yIgnored

Not used, present for API consistency by convention.

active_maskarray-like of shape (n_observations, n_assets), optional

Boolean mask indicating whether each asset is structurally active at each observation. Use this to distinguish between holidays (active_mask=True and NaN return: mean is frozen) and inactive periods such as pre-listing or post-delisting (active_mask=False: mean is set to NaN). If None (default), all assets are assumed active.

Returns:
selfEWMu

Fitted estimator.

set_fit_request(*, active_mask='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
active_maskstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for active_mask parameter in fit.

Returns:
selfobject

The updated object.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

set_partial_fit_request(*, active_mask='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the partial_fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to partial_fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to partial_fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
active_maskstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for active_mask parameter in partial_fit.

Returns:
selfobject

The updated object.