Note

Go to the end to download the full example code. or to run this example in your browser via Binder

# NCO - Combinatorial Purged CV#

The previous tutorial introduced the
`NestedClustersOptimization`

.

In this tutorial, we will perform hyperparameter search using `GridSearch`

and
distribution analysis with `CombinatorialPurgedCV`

.

## Data#

We load the S&P 500 dataset composed of the daily prices of 20 assets from the S&P 500 Index composition starting from 2015-01-02 up to 2022-12-28:

```
from plotly.io import show
from sklearn.model_selection import GridSearchCV, train_test_split
from skfolio import Population, RatioMeasure, RiskMeasure
from skfolio.cluster import HierarchicalClustering, LinkageMethod
from skfolio.datasets import load_sp500_dataset
from skfolio.distance import KendallDistance, PearsonDistance
from skfolio.model_selection import (
CombinatorialPurgedCV,
WalkForward,
cross_val_predict,
optimal_folds_number,
)
from skfolio.optimization import (
EqualWeighted,
MeanRisk,
NestedClustersOptimization,
RiskBudgeting,
)
from skfolio.preprocessing import prices_to_returns
prices = load_sp500_dataset()
prices = prices["2015":]
X = prices_to_returns(prices)
X_train, X_test = train_test_split(X, test_size=0.5, shuffle=False)
```

## Model#

We create two models: the NCO and the equal-weighted benchmark:

```
benchmark = EqualWeighted()
model_nco = NestedClustersOptimization(
inner_estimator=MeanRisk(), clustering_estimator=HierarchicalClustering()
)
```

## Parameter Tuning#

We find the model parameters that maximizes the out-of-sample Sharpe ratio using
`GridSearchCV`

with `WalkForward`

cross-validation on the training set.
The `WalkForward`

are chosen to simulate a three months (60 business days) rolling
portfolio fitted on the previous year (252 business days):

```
cv = WalkForward(train_size=252, test_size=60)
grid_search_hrp = GridSearchCV(
estimator=model_nco,
cv=cv,
n_jobs=-1,
param_grid={
"inner_estimator__risk_measure": [RiskMeasure.VARIANCE, RiskMeasure.CVAR],
"outer_estimator": [
EqualWeighted(),
RiskBudgeting(risk_measure=RiskMeasure.CVAR),
],
"clustering_estimator__linkage_method": [
LinkageMethod.SINGLE,
LinkageMethod.WARD,
],
"distance_estimator": [PearsonDistance(), KendallDistance()],
},
)
grid_search_hrp.fit(X_train)
model_nco = grid_search_hrp.best_estimator_
print(model_nco)
```

```
NestedClustersOptimization(clustering_estimator=HierarchicalClustering(),
distance_estimator=PearsonDistance(),
inner_estimator=MeanRisk(risk_measure=CVaR),
outer_estimator=EqualWeighted())
```

## Prediction#

We evaluate the two models using the same `WalkForward`

object on the test set:

```
pred_bench = cross_val_predict(
benchmark,
X_test,
cv=cv,
portfolio_params=dict(name="Benchmark"),
)
pred_nco = cross_val_predict(
model_nco,
X_test,
cv=cv,
n_jobs=-1,
portfolio_params=dict(name="NCO"),
)
```

Each predicted object is a `MultiPeriodPortfolio`

.
For improved analysis, we can add them to a `Population`

:

```
population = Population([pred_bench, pred_nco])
```

Let’s plot the rolling portfolios compositions:

```
population.plot_composition(display_sub_ptf_name=False)
```

Let’s plot the rolling portfolios cumulative returns on the test set:

```
fig = population.plot_cumulative_returns()
show(fig)
```

## Analysis#

The NCO outperforms the Benchmark on the test set for the below measures: maximization:

```
for ptf in population:
print("=" * 25)
print(" " * 8 + ptf.name)
print("=" * 25)
print(f"Ann. Sharpe ratio : {ptf.annualized_sharpe_ratio:0.2f}")
print(f"CVaR ratio : {ptf.cvar_ratio:0.4f}")
print("\n")
```

```
=========================
Benchmark
=========================
Ann. Sharpe ratio : 0.88
CVaR ratio : 0.0235
=========================
NCO
=========================
Ann. Sharpe ratio : 1.30
CVaR ratio : 0.0376
```

## Combinatorial Purged Cross-Validation#

Only using one testing path (the historical path) may not be enough for comparing both
models. For a more robust analysis, we can use
`CombinatorialPurgedCV`

to create multiple testing
paths from different training folds combinations.

We choose `n_folds`

and `n_test_folds`

to obtain around 30 test paths and an average
training size of 252 days:

```
n_folds, n_test_folds = optimal_folds_number(
n_observations=X_test.shape[0],
target_n_test_paths=30,
target_train_size=252,
)
cv = CombinatorialPurgedCV(n_folds=n_folds, n_test_folds=n_test_folds)
cv.summary(X_test)
```

```
Number of Observations 1006
Total Number of Folds 9
Number of Test Folds 7
Purge Size 0
Embargo Size 0
Average Training Size 223
Number of Test Paths 28
Number of Training Combinations 36
dtype: int64
```

```
pred_nco = cross_val_predict(
model_nco,
X_test,
cv=cv,
n_jobs=-1,
portfolio_params=dict(tag="NCO"),
)
```

The predicted object is a `Population`

of `MultiPeriodPortfolio`

. Each
`MultiPeriodPortfolio`

represents one testing path of a rolling portfolio.

## Distribution#

We plot the out-of-sample distribution of Sharpe Ratio for the NCO model:

```
pred_nco.plot_distribution(measure_list=[RatioMeasure.ANNUALIZED_SHARPE_RATIO])
```

Let’s print the average and standard-deviation of out-of-sample Sharpe Ratios:

```
print(
"Average of Sharpe Ratio :"
f" {pred_nco.measures_mean(measure=RatioMeasure.ANNUALIZED_SHARPE_RATIO):0.2f}"
)
print(
"Std of Sharpe Ratio :"
f" {pred_nco.measures_std(measure=RatioMeasure.ANNUALIZED_SHARPE_RATIO):0.2f}"
)
```

```
Average of Sharpe Ratio : 0.86
Std of Sharpe Ratio : 0.18
```

Let’s compare it with the benchmark:

```
pred_bench = benchmark.fit_predict(X_test)
print(pred_bench.annualized_sharpe_ratio)
```

```
1.0507476631082548
```

## Conclusion#

This NCO model outperforms the Benchmark in terms of Sharpe Ratio on the historical test set. However, the distribution analysis on the recombined (non-historical) test sets shows that it slightly underperforms the Benchmark in average.

This was a toy example to present the API. Further analysis using different
estimators, datasets and CV parameters should be performed to determine if the
outperformance on the historical test set is due to chance or if this NCO model is
able to exploit time-dependencies information lost in `CombinatorialPurgedCV`

.

**Total running time of the script:** (0 minutes 33.837 seconds)