Data snooping bias

Updated on 2009-08-18

With all the available tools in the QuantShare software, it is so easy to create and backtest several trading rules or trading systems. You can use the bulk method, which analyzes all the combinations, or an AI method, which uses artificial intelligence algorithms to find the best performing rules.

We have seen in a previous article how to create thousands of trading rules and how to backtest them all in a one click step (How to create and backtest thousands of trading rules in less than 10 minutes ). With enough combinations, you will end up with one or several good performing rules (The performance we are talking about here is not necessary the security return, it is what is defined in the output formula). And you might think that you have found your profitable rules and that the only thing you have to do now is to build a trading system and start trading real money using it.

Unfortunately, this is not as easy. The likelihood that the profitable rules you have found is due to pure luck is too high, especially if you have backtested thousands of rules.
So what is the solution? How can we make sure that the rules we have picked doesn't suffer from the data snooping bias?

The data snooping bias is a statistical bias that appears when exhaustively searching for combinations of variables, the probability that a result arose by chance grow with the number of combinations tested.

Again, unfortunately there is no way to make sure that these rules donâ€™t suffer from this statistical bias, but what we can do is to minimize the likelihood that the results suffer from the data snooping bias. We can do this using different techniques; in the next paragraph we will explain and show you how to use a simple and popular technique called out-of-sample testing.

In order to minimize the probability that our results occurred simply by chance, we can divide that data that we used in the backtesting process into 2 samples.
The first one is called the in-sample and it is the data sample that will be used to backtest all the combinations that result from the initial trading rules.
The second one is called out-of-sample and it is used as a way to test the best performing rules (the one that were picked from the in-sample backtesting) on new data. The out-of-sample testing acts as a filter, where the rules that didn't perform as well as in the in-sample test will be rejected and only rules that passes both tests are accepted.
This technique decreases dramatically the likelihood that the rules data suffer from data snooping bias.

In the QuantShare application, performing out-of-sample testing is very easy.
In the rules analyzer for example, after creating your list of rules and when the analyzer setting form appears. You can start the in-sample testing process by selecting for example a start date of 2005 and an end date of 2007, and use the period of the 2007-2009 for the out-of-sample testing.
At the end of the in-sample testing, select the best rules or the rules that meet your criteria of profitability and create a new list of rules. Analyze this newly created list of rules with the out-of-sample period (2007 to 2009).

The periods used in the two samples discussed above must be chosen carefully. It is preferable that no market regime changes occur between the two sample periods. Examples of market regime changes are: the bull market/bear market, inflation/deflation...

You can introduce a third test period, which is paper trading. If your trading system passes the two filters discussed above, you can paper trade it (trading with no real money) and if it appears to be still profitable you can switch to real money trading.

no comments (Log in)

QuantShare Blog