Data Preparation#

Most fit methods of skfolio estimators take the assets returns as input X. Therefore, the choice of methodology to convert prices to returns is left to the user.

There are two different notions of return:

Linear return#

Linear return (or simple return) is defined as:

\[R^{Lin}_{t} = \frac{S_{t}}{S_{t-1}} - 1\]

Linear returns aggregates across securities, meaning that the linear return of a portfolio is the sum of the weighted linear returns of its components:

\[R^{Lin}_{t} = \sum_{i=1}^{N} w_{i} \times R^{Lin}_{i,t}\]

This property is needed to properly compute portfolio return and risk ([5]). However, linear returns cannot be aggregated across time.

Logarithmic return#

Logarithmic return (or continuously compounded return) is defined as:

\[R^{Log}_{t} = ln\Biggl(\frac{S_{t}}{S_{t-1}}\Biggr)\]

Logarithmic returns aggregates across time, meaning that the logarithmic return over k periods is the sum of all single-period logarithmic returns:

\[R^{Log}_{t..k} = ln\Biggl(\frac{S_{t+k}}{S_{t}}\Biggr) = \sum_{j=1}^{k} ln\Biggl(\frac{S_{t+j}}{S_{t+j-1}}\Biggr)= \sum_{j=1}^{k-1} R^{Log}_{t+j}\]

Given this property, it is easy to scale logarithmic return from one time period to another. However, logarithmic return cannot be aggregated across securities:

\[R^{Log}_{t} = ln\Biggl(\frac{S_{t}}{S_{t-1}}\Biggr) = ln\Biggl(1+\sum_{i=1}^{N} w_{i} \times R^{Lin}_{i,t}\Biggr)\]

Pitfall in Portfolio Optimization#

Given the similarities of linear and logarithmic returns in the short run, they are sometimes used interchangeably. It is not uncommon to witness the following steps ([1], [2], [3]):

Take the daily prices \(S_{t}, S_{t+1}, ...,\) for all the n securities
Transform the daily prices to daily logarithmic returns
Estimate the expected returns vector \(\mu\) and covariance matrix \(\Sigma\) from the daily logarithmic returns
Determine the investment horizon, for example k = 252 days
Project the expected returns and covariance to the horizon using the square-root rule: \(\mu_{k} ≡ k \times \mu\) and \(\Sigma_{k} ≡ k \times \Sigma\)
Compute the mean-variance efficient frontier \(\max_{w} \Biggl\{ w^T \mu - \lambda \times w^T \Sigma w \Biggr\}\)

The above approach is incorrect. First, the square-root rule in (5) only applies under the assumption that the logarithmic returns are invariants (they behave identically and independently across time). It is approximately true for stocks, but it is not true for bonds nor most derivatives like options. Secondly, even for stocks, the optimization (6) is ill-posed: \(w^T \mu\) is not the expected return of the portfolio over the horizon and \(w^T \Sigma w\) is not its variance. These would lead to suboptimal allocations and the efficient frontier would not depend on the investment horizon.

The correct approach#

The correct general approach is the following:

Find the market invariants (logarithmic return for stocks, change in yield to maturity for bonds, etc.)
Estimate the joint distribution of the market invariant over the time period of estimation
Project the distribution of invariants to the time period of investment
Map the distribution of invariants into the distribution of security prices at the investment horizon through a pricing function
Compute the distribution of linear returns from the distribution of prices

Example for stocks#

Take the prices \(S_{t}, S_{t+1}, ...,\) (for example daily) for all the n securities
Transform the daily prices to daily logarithmic returns. Note that linear return is also a market invariant for stock, however logarithmic return is going to simplify step 3) and 4).
Estimate the joint distribution of market invariants by fitting parametrically the daily logarithmic returns to a multivariate normal distribution: estimate the joint distribution parameters \(\mu^{Log}_{daily}\) and \(\Sigma^{Log}_{daily}\)
Project the distribution of invariants to the time period of investment (for example one year i.e. 252 business days). Because logarithmic returns are additive across time, we have ([4], [7]):
- \[\mu^{Log}_{yearly} = 252 \times \mu^{Log}_{daily}\]
- \[\Sigma^{Log}_{yearly} = 252 \times \Sigma^{Log}_{daily}\]
Compute the distribution of linear returns at the investment horizon. Using the characteristic function of the normal distribution, and the pricing function \(S_{yearly} = S_{0} e^{R^{Log}_{yearly}}\), we get:
- \[\mathbb{E}(S_{yearly}) = \pmb{s}_{0} \circ exp\Biggl(\pmb{\mu}^{Log}_{yearly} + \frac{1}{2} diag\Biggl(\pmb{\Sigma}^{Log}_{yearly}\Biggr)\Biggr)\]
- \[Cov(S_{yearly}) = \mathbb{E}(S_{yearly})\mathbb{E}(S_{yearly})^T \circ \Biggl(exp\Biggl(\pmb{\Sigma}^{Log}_{yearly}\Biggr)-1\Biggr)\]
From which we can estimate the moments of the linear returns at the time horizon:
- \[\pmb{\mu}^{Lin}_{yearly} = \frac{1}{\pmb{s}_{0} } \circ \mathbb{E}(S_{yearly}) -1\]
- \[\pmb{\Sigma}^{Lin}_{yearly} = \frac{1}{\pmb{s}_{0}\pmb{s}_{0}^{T} } \circ Cov(S_{yearly})\]

Where \(\circ\) denotes the Hadamard product (element-wise product).

Note that we could have derived the distribution of linear returns from the distribution of logarithmic returns directly in this case. Here we demonstrated the general procedure.

In skfolio#

In skfolio, the above can be achieved using EmpiricalPrior by setting is_log_normal to True and providing investment_horizon. The input X must be linear returns. The conversion to logarithmic returns is performed inside the estimator.

However, as seen in the example Investment Horizon, for frequently rebalanced portfolios (investment horizon less than a year), the general procedure and the below simplified one will give very close results:

Take the prices \(S_{t}, S_{t+1}, ...,\) (for example daily) for all the n securities
Transform the daily prices to daily linear returns
Estimate the expected returns vector \(\mu\) and covariance matrix \(\Sigma\) from the daily linear returns
Compute the mean-variance efficient frontier \(\max_{w} \Biggl\{w^T \mu - \lambda \times w^T \Sigma w\Biggr\}\)

This simplified procedure is the default one used in all skfolio examples as most portfolios are rebalanced with a frequency less than a year.

In both cases, it is highly recommended to use linear return for the input `X` If you need to estimate the moments from logarithmic returns, the conversion from linear to logarithmic returns should be reformed inside the estimator.

For bonds and options, the general procedure will be implemented in a future release. In the meantime you can use your own custom prior estimator.

References