Doing a backtest is good. **Be able to trust it is better**! That’s where the robustness testing takes place. Indeed, thanks to different methods we can analyze the **probability of overfitting**, the probability to obtain a Sharpe ratio higher than 1 and much more. We can compute all of that using different robustness testing methods: the combinatorial purged cross validation is one example, the monte-carlo simulations another and there are many others. In this article we will focus on the **combinatorial purged cross validation (CPCV)**.

## 1. ROBUSTNESS TESTING ARE BACKTESTING RELIABILITY TESTS

As I started to explain it in the introduction, the **robustness tests in trading** will help you to compute the reliability of your backtesting. Indeed, when we look at our backtest we have several questions and the most frequent is “**These performances are not just due to randomness**”, a lucky randomness?

And that’s a good question. The historical path is only one path that the past could be. It means that this path can be the only one good. Of course, t**here is an infinity of possible paths that the past could be**. So, we need to simulate different paths based on the historical data.

And I insist on the point, “**using the historical data**”: you need to generate data based on the same distribution or resample the historical path. You can’t just run a normal distribution and test your strategy on it. You can also use more complex method like the GAN.

## Join Our Newsletter

Be the first to receive our latest quant trading content (personal notes, discount, new articles).

## 2. COMBINATORIAL PURGED CROSS VALIDATION IN TRADING

If you are here, the odds that you have already heard about **cross-validation** are high. The standard cross validation method is very good… for non-time series values. However, asset prices are time series, so how to deal with that? We use a **purged cross-validation**.

The purged cross validation will remove a small part of the data when we switch from an out sample to an in sample and vice versa. The goal is to **avoid the leakage of information between the different samples**. Indeed, the time series keep generally information from the past data (if you are using technical indicators for example it is obvious: a moving average has some past information thanks to the previous period).

On the other hand, the term **combinatorial** comes from the method we use to create our different paths. We will define that we split our time series in N and we take K samples to be the out sample (where K<N). **For example, you can take N=10 and K=2, it will give you 45 paths**: it is all the combinations possible of 2 samples in 10 samples.

**Figure: Illustration about how the CPCV create different paths**

## 3. THE STEP BY STEP ROBUSTNESS TESTING PROCESS

Understand the CPCV is good but let’s explain how can we use it to extract interesting information like the probability of overfitting (PBO), the probability to obtain a positive Sharpe ratio (PPSR) and much more.

### STEP 1: COMBINATORIAL PURGED CROSS VALIDATION

The first thing to do is to use the combinatorial purged cross validation to obtain some results. So, **you will have many paths with in samples and out samples**. Moreover, as for the walk forward optimization, you need to know which parameters you want to use and which criterion you will use to know which set of parameters are the best.

Let’s take the same example as for the walk forward. We want to **optimize the SMA period and the RSI period for a strategy**. And the **criterion** that will allow us to order our parameters will be the **Sharpe ratio**.

Then, you will apply your set of parameters on all the samples (in samples and out samples). For each in-sample/out-sample couple (one path),** you will have two tables**: one with the parameters and the associated criterion for the in sample and the same for the out sample.

**Figure: ****One path possible output from the CPCV**

### STEP 2: PROBABILITY OF OVERFITTING

Once, we have all these table we need to extract information from them (one for each in sample and one for each out sample per path: so, **if we have 50 paths, 100 tables**). If you want to go deeper and look to the math equation behind that, I put the research paper associated to this section in the bibliography.

FOR EACH IN SAMPLE, we will take the **best combination of parameters**, then we will check the rank of this set of parameters into the **ordered out sample table**. For example, in our previous example, the best parameters were (60,13). If we assume that we have 50 possible combinations, and the parameters (60,13) are ranked 10 over 50 it is pretty good, and the r**elative rank is 10/50** here (rank / number_of_combinations). Now, we need to convert this relative rank into a logit which is computed as log (relative rank/ (1- relative rank). You have now the **logit for one path**. You need to iterate that over all the paths, then you will obtain a logits distribution.

**If a logit is above 0 it is considerate as non-overfitted**. So, the PBO is the number of negative logits divided by the number of logits. A PBO lower to 15% begins to become really good. But, the PBO alone doesn’t mean anything, **we need to associate it to a measure of performance**: for example, the probability to obtain a positive Sharpe ratio.

**Figure: Logit distribution with the associated PBO**

### STEP 3: PROBABILITY OF POSITIVE SHARPE RATIO

Fortunately, this metric is much easier to compute. **You can do the same using any other metrics** (we will do something similar with the drawdown in the next article).

To obtain the **probability to have a positive Sharpe ratio** or a Sharpe ratio above a certain threshold, we need first to extract the Sharpe ratio of the out sample associated to the best parameters in sample. If we keep our previous example, we take the OOS Sharpe ratio associated to the parameters (60,13), **here 0.97**.

When you do that for all the paths, you will obtain a distribution of Sharpe Ratio and so you will be able to compute the probability to obtain a Sharpe ratio above a certain threshold (t). Indeed, **P(t<SR) = number of SR higher than t / number of Sharpe ratio**.

## 4. THE BENEFITS OF THE ROBUSTNESS TESTS

**robustness tests in trading**are so powerful, they help to understand how good your historical path backtest is compared to the other possible paths. Let me give you a few benefits:

**Quantity overfitting**: when you are optimizing the best parameters on the walk forward optimization, you have no idea if your results are overfitted or not. Taking the same parameters as the walk forward optimization will give you an idea about how much your backtesting is overfitted.**Multiple paths backtesting**: thanks to the combinatorial method, it is relatively easy to obtain a lot of different paths (50, 100, 1000…). This helps us obtain enough simulations that the past could be to have reliable data to analyze.**Unlimited usage**: once you have done the CPCV, you can create the PBO, you can create the PPSR but you can definitely create the metrics you want. You only limit is your imagination.

## 5. THE ROBUSTNESS TESTS LIMITATIONS

**limitations of the robustness testing in trading**, so you can combine this method with others to limit these problems.

**No best parameters**: if you do only a robustness testing, you will not be able to find the best parameters to put your strategy in live trading. It gives only an overview of the range in the performances you can obtain depending on the path to use.**Number of data**: you need enough data to obtain a reliable output. Indeed, anyone can obtain a non-overfitted strategy on a one-week backtest, but are you able to do the same on 10 years historic?**Historical paths**: Even if we have resampled our datasets as much as we can, it is still historical data, to fix that you can generate some new paths like with a Monte Carlo simulation or a GAN

So, robustness testing is essential in backtesting to verify the reliability of your backtest. But you need to combine it with another method like the walk forward optimization. Moreover, we can work to improve this method on different ways: best way to find the best criterion than only order the criterion, doing the same on generated data.

If you have any question, feel free to ask your question on my public discord forum or directly in private messages on Linkedin.

## BIBLIOGRAPHY

**👇🏼 Join the newsletter to be informed when the next article of the series will be issued **

## Join Our Newsletter

Be the first to receive our latest quant trading content (personal notes, discount, new articles).

Lucas Inglese

Lucas is a self-taught Quantitative Analyst, holding degrees in Mathematics and Economics from the University of Strasbourg. Embarking on an independent learning journey, he delved deeply into data science and quantitative finance, eventually mastering the disciplines. Lucas has developed numerous bots, sharing his insights and expertise on LinkedIn, where he regularly posts content. His understanding and empathy for beginners in this complex field led him to author several books and create the comprehensive “Alpha Quant Program.” This program offers e-learning videos, monthly projects, and continuous 7-day-a-week support. Through his online courses and publications, Lucas has successfully guided over 67,000 individuals in their pursuit of knowledge in quantitative finance.

## Related Posts

### Trading Backtest Explained – 3 real life exemples

Understand how to implement the different methods to do a backtest is good. But knowing how to use them to

### Why using Monte-Carlo Simulations in Trading?

If you are interested by quant finance, you should have heard about Monte Carlo simulations in trading. However, the concept

### The Walk Forward Optimization in trading

Backtesting in not a research tool, ok fine! we have understood it! But, if we can’t optimize our parameters using

### The 6 Biggest Portfolio Backtesting Mistakes

If you are backtesting a single trading strategy or you are doing a portfolio backtesting, you need to avoid the

### The Scientific Backtesting Guide

If you are here, you know that backtesting on the historical data is clearly not enough to obtain the reliability

### What is a Backtesting in Trading?

We all have already seen amazing backtesting in trading: 99% are false! This is the first article of a series