Recurrent Neural Networks – Proper Use for Time Series Analysis

machine learningneural networkstime series

Recurrent neural networks differ from "regular" ones by the fact that they have a "memory" layer. Due to this layer, recurrent NN's are supposed to be useful in time series modelling. However, I'm not sure I understand correctly how to use them.

Let's say I have the following time series (from left to right): [0, 1, 2, 3, 4, 5, 6, 7], my goal is to predict i-th point using points i-1 and i-2 as an input (for each i>2). In a "regular", non-recurring ANN I would do process the data as follows:

 target| input
      2| 1 0
      3| 2 1
      4| 3 2
      5| 4 3
      6| 5 4
      7| 6 5 

I would then create a net with two input and one output node and train it with the data above.

How does one need to alter this process (if at all) in the case of recurrent networks?

Best Answer

What you describe is in fact a "sliding time window" approach and is different to recurrent networks. You can use this technique with any regression algorithm. There is a huge limitation to this approach: events in the inputs can only be correlatd with other inputs/outputs which lie at most t timesteps apart, where t is the size of the window.

E.g. you can think of a Markov chain of order t. RNNs don't suffer from this in theory, however in practice learning is difficult.

It is best to illustrate an RNN in contrast to a feedfoward network. Consider the (very) simple feedforward network $y = Wx$ where $y$ is the output, $W$ is the weight matrix, and $x$ is the input.

Now, we use a recurrent network. Now we have a sequence of inputs, so we will denote the inputs by $x^{i}$ for the ith input. The corresponding ith output is then calculated via $y^{i} = Wx^i + W_ry^{i-1}$.

Thus, we have another weight matrix $W_r$ which incorporates the output at the previous step linearly into the current output.

This is of course a simple architecture. Most common is an architecture where you have a hidden layer which is recurrently connected to itself. Let $h^i$ denote the hidden layer at timestep i. The formulas are then:

$$h^0 = 0$$ $$h^i = \sigma(W_1x^i + W_rh^{i-1})$$ $$y^i = W_2h^i$$

Where $\sigma$ is a suitable non-linearity/transfer function like the sigmoid. $W_1$ and $W_2$ are the connecting weights between the input and the hidden and the hidden and the output layer. $W_r$ represents the recurrent weights.

Here is a diagram of the structure:

schematic