Occasionally, there may be outliers in an index value due to slippage in the data pipeline, and while this may be insignificant over many periods, it can be problematic in signal-based implementations.
One solution to this is to compute a rolling average for the index values. This results in a smoothed representation of the data that it is significantly less prone to outliers.
Fortunately, this can be done in just one line of code thanks to the pandas package:
import QuantGlobal as qg
import pandas as pd
# Make a query for data
data = qg.download(params)
# Take the index values, in this case – the spread, then replace it with a rolling average using a 3 row window.
# If the data is less than 3 rows long, we fill the not available values with the original data.
data['Spread'] = data['Spread'].rolling(3).mean().fillna(data['Spread'])
Here is a graphical representation of how this appears. In blue represents simulated data errors, the orange line represents the same calculations but with no data errors:
Raw data (blue) with processing outliers:
Rolling period of 3:
Rolling period of 6: