Trading and backtesting stocks, futures and options with rules that use the high and low prices or limit orders can generate misleading results when using with a non-clean historical quotes database. In intraday sessions, several bad ticks could appear. These tick prices are often far from the transaction prices, and can be easily detected in a real-time stream. This is not the case for longer term data such as an EOD data. The consequence of this problem is that sometimes, in end of day quotes data, the high and low prices will be incorrect and the historical chart will have some big spikes (long shadows) that shouldn't exist if these bad ticks were cleaned.
With an EOD database, the only solution to clean up these spikes is to guess whether the reported high and low prices are correct or not.
There are two ways to clean up an historical database using QuantShare. The first solution is to create a script that loops through your symbol quotes data and executes some sort of cleaning routines then save the modified data. The other way, which we will explain in more details later, is to create a Post-Script formula directly in the downloader so that the quotes are automatically cleaned before they are filled into the database.
Several methods can be used to clean quotes from bad ticks. We will details a simple method that uses a standard deviation threshold to detect potential errors in high and low prices.
First of all, you need to open the Post-Script form of your downloader. Open your end of day quotes downloader (example: the yahoo downloader - Yahoo EOD historical quotes). Click on settings, then on next and finally click on the 'Post Script' button.
This filtering method is rather simple, it consists of calculating the standard deviation of the bar range (high - low) for all the available quotes, and update the high and low prices for the bars where the range exceeds 3 standard deviations. Any other level could also be used. For a normal distribution, there is only 5% of chance that a bar range falls outside 2 standard deviations and 0.3% chance that it falls outside 3 standard deviations.
If a range falls outside our boundaries, we must update the high and low prices. A conservative and simple way to do this consists of removing completely the lower and upper shadows. The new bar will consist of only a body.
The filter script can be downloaded here: Filter high and low spikes. You can find it in the Post-Script form.
Note: This script provides a way to deal with the high and low price spikes problem; several other methods could be used to filter quotes data.