The Scientific Backtesting Guide

If you are here, you know that backtesting on the historical data is clearly not enough to obtain the reliability of your trading strategy. And, when I say not enough, I mean if you use only historical data, you will lose a lot of money. The role of this backtesting guide is to show you how I backtest my trading strategies. We will see many backtesting technic and how we can combine them. (The list of methods I will give you is, of course, non-exhaustive).

You can find many templates to improve your backtesting methods in the Alpha Quant Program.

Play Video about several graphs, two line charts and two area charts


Backtesting is the heart of all algorithmic and quantitative trading strategies. It helps you to understand the strengths and the weaknesses of your strategy: in other terms, if the strategy is good or you just have to put it to the trash. Check the next article (available on the 02/03/24) if you need more information about what a backtesting in trading is and why we do it.

The problem is that there is not one method that will assure you a profitable trading strategy: in reality the goal of a backtest is to apply as many tests as possible in order to minimize the odds to have a non-profitable trading strategy in live trading after a good backtest.

You need to see the backtesting like a hypothesis test in statistics: we are never sure that the null hypothesis is false, we have an error threshold. In trading, because of the market condition switching, your backtesting will never totally fit the reality. But the goal is to analyze the trading strategy with many methods to check about the potential overfitting, how the strategy performs in the wrong market conditions, how it performs on the different path that the past could be…


The first method that we will apply is the walk-forward optimization. This method is the easiest: we will apply several standard backtest (in-sample optimization and out-sample test) to optimize the strategy parameters through the time.

Figure: Visual explanation about the Walk-Forward optimization

representation of a stock prices with different area highlight

The clear advantage of it is that we can have a standard backtest on the historical data with a much longer out sample. Moreover, with this method, we have backtest the strategy but also how we update the parameters of the strategy which very interesting if you keep this strategy in live trading.

Figure: Backtest result from a walk forward backtesting

Backtest of a trading strategy using Python. It show a clear upward trend in the results

Join Our Newsletter

Be the first to receive our latest quant trading content (personal notes, discount, new articles).


The probability of overfitting seems to be one of the most important parts of this backtesting. It is important, yes, but it is as important than the other tests, not more. Indeed, thanks to a cross-validation becomes easy to compute a probability of overfitting: more precisely a combinatorial purged cross validation because prices are time series.

Figure: Possible past values for the asset price

different area highlighting on a trading chart

Once we have backtested the strategy on all the OOS (out of sample) optimizing on the IS (in sample), we can create the distribution of logits. These logits are created testing all the parameters’ combinations on the IS and the OOS: if the value is above 0, we consider it not overfitted: the detailed method in one of the next articles.

Once we have all the logits (we can create as many samples we want), we will compute the distribution of the logits to obtain the probability to have a logit above 0. The more this probability is close to 1, the better it is. The probability of overfitting (PBO) is one minus the probability to have a logit above 1.

But be careful, here we talk about overfitting, we do not talk about profitability: a strategy can have a PBO of 0 and in the same have a non-profitable trading strategy and vice versa.

Figure: Possible distribution of the logits

histogram with a threshold highlight: the PBO
several computers, and laptops with trading pictures



Still using the simulations from the combinatorial purged cross validation (CPCV), we can compute several other metrics and one we will talk about is the probability of a positive Sharpe ratio. No needs a lot of explanations, to compute it, we take all the Sharpe ratio of the OOS coming from the CPCV and we compute the percentage of positive Sharpe ratio.

This metric will be combined to the PBO. Indeed, there are the two faces of the same coin: we need a low PBO and a high PPSR (probability of positive Sharpe ratio). I will even say that a strategy with a high PBO and a high PPSR is better than a strategy with a low PBO and a low PPSR. In the first case, we earn money but not in an optimal way, in the second we do not earn money.

Moreover, you can also take the probability of a profitable Sharpe ratio, which is quite similar. Instead of taking the positive Sharpe ratio, you take the Sharpe ratio greater than a threshold (0.5 or 1 for example).

Generally, I try to have a PPSR close to 1 and a probability of profitable Sharpe ratio greater than 60% of 70% depending on the threshold you are taking.


Risk management, backtesting, derivates pricing… Let’s say that Monte-Carlo simulations is one of the most known tools in finance and trading. The idea behind it is easy and at the same time so powerful.



We know that the asset prices are random, so the past is only one sample from the price distribution. So, why not considering the price characteristic to generate more samples and apply the strategy. It will allow us to backtest our strategy on different simulations that the past could be. (We generally do it using the mean and the standard deviation, but we can apply more complex method like GAN)

If we do that, we will have several strategies returns over a certain period (one month, one year, ten years…). Most important, the prices that we have applied our strategy on come from the same distribution as the historical path.

Figure: Strategy returns on simulations based on the historical data distribution

line chart with hundreds of line with the same trends


Considering our previous Monte-Carlo simulations, you can compute a risk of ruin. It is like the PPSR (probability of positive Sharpe ratio). Once you have the simulations, you can aggregate them to compute a probability to have lost x% of your capital after N days.

So, if you have considered that after a loss of 20% you will quit all your position to take a break and think about what you did wrong, you can compute the probability to be in this situation. Of course, the lower the probability is, the better it is.

For example, on the previous Monte Carlo simulation, we have a probability to have a drawdown higher than 20% equal to 0 after 1 year (based to our previous observation, but we know that in reality it is a bit higher if we have an unexpected crisis for example).

Figure: drawdown from the different simulations

histogram with a threshold highlight: the drawdown distribution


Now you have the results for all the different methods: you need to interpret it and, in my opinion, you should not automate this process. Indeed, if you need to automate it, it means that you are testing to much strategy. For now, we didn’t talk about the impact of testing too much trading strategy on the same dataset but we will do it very quickly when I update this article.

Moreover, when you automate the strategy selection, you can refuse a trading strategy because it is at the limit of your threshold. You need to adapt yourself to the situation: if you want a drawdown max about 10% and your strategy has a max drawdown about 11% and the rest is very good, we keep it.

Last but not least, we will analyze the output of each method and summarize them into a table. This table will help us to define if we keep or not the trading strategy. It will help us to summarize all our previous analysis in one place.

Figure: Example from a validated backtest

Walk forward
SR distribution
Return simulations
Risk of ruin

I hope this article gives you a quick overview about the complexity of backtesting in trading. It is far most advanced than applying your trading strategy on historical data… In the next article, we will explain the biggest backtesting mistakes that 99% of the people do.

👇🏼 Join the newsletter to be informed when the next article of the series will be issued 

Join Our Newsletter

Be the first to receive our latest quant trading content (personal notes, discount, new articles).

Lucas Inglese

Lucas is a self-taught Quantitative Analyst, holding degrees in Mathematics and Economics from the University of Strasbourg. Embarking on an independent learning journey, he delved deeply into data science and quantitative finance, eventually mastering the disciplines. Lucas has developed numerous bots, sharing his insights and expertise on LinkedIn, where he regularly posts content. His understanding and empathy for beginners in this complex field led him to author several books and create the comprehensive “Alpha Quant Program.” This program offers e-learning videos, monthly projects, and continuous 7-day-a-week support. Through his online courses and publications, Lucas has successfully guided over 67,000 individuals in their pursuit of knowledge in quantitative finance.

Related Posts

Scroll to Top