Solved – Trending time series data normalization for Deep Learning

deep learningforecastingmachine learningnormalizationtime series

I'm replicating following article Financial Time Series Prediction using Deep Learning and I'm stuck with data normalization. In chapter 5.1 in the second paragraph in the last sentense the authors claims "Each input sequence was filtered by five taps long, moving uniform averaging, and then normalized by reducing the mean, and dividing by its standard deviation"

I have several questions, specifically under section [5.1] on page 10:

1) What do they mean by "Each input sequence was filtered by five taps long, moving uniform averaging"? I do not understand this completely

2) "…then normalized by reducing the mean, and dividing by its standard deviation". How do they normalize nonstationary trending price data? They train an ANN with SPY ETF minute prices on 2001-2013 time period and use 60 lags to predict price trends, so how do they compute mean and std? I guess they compute mean and std for each sample that is on 60 lags and then normalize each sample sequence individually. If this's the way they do it then how to normalize test data?

Best Answer

  1. If I read it correctly they're using only times between 9:30-16:00 (~510 minuets), dropping just about everything that doesn't conform, then chunking each day up into roughly 300 units per day's trading hours (~1.7 minuets of data per sample), and retain the closing price from the last 60 minuets, all before the quoted line... context can be helpful... so if things are adding up then maybe a tap is about 102 seconds, if so five of'em would be about 8.5 minuets of data.

However, I could be totally wrong... I've tried to parse that paragraph and can totally see how by the time the author is up-to the quoted line it's very much a dash of this or that, not to pinch too hard at their ego but considering how well referenced the surrounding text is it's totally understandable to be a bit lost with that sentence.

  1. There's a solid Khan Academy on standard error of the mean (a.k.a. the standard deviation of the sampling distribution of the sample mean!) which may help in sorting out the what the authors where eluding to.

As to how they are normalizing their test data, it's stated more than once in the linked to paper that they are using raw inputs to a network that outputs probability of movement, direction, and magnitude... Well I think that's what they where getting at... They get deeper into the pre-processing in section [3.3] Preprocessing on page 6 where this network's architecture is described in more detail, also see figures 1 and 2 on that page as those show how the data flows through.


I may have to come back to this and take a second crack at dissecting their research, but hopefully some of this was able to gain ya some traction.