Solved – Univariate time series multi step ahead prediction using multi-layer-perceptron (MLP)

deep learningkerasperceptrontensorflowtime series

I have a univariate time series data. I want to do a multi-step prediction. I came across this question which explains time series one step prediction.
but I am interested in multi-step ahead prediction. For example, a typical univariate time series data looks like this:

    time  value
    ----  ------
    t1      a1
    t2      a2
    ..........
    ..........
    t100    a100.

Suppose, I want 3 step ahead prediction. Can I frame my problem like:

   TrainX                 TrainY
[a1,a2,a3,a4,a5,a6]   -> [a7,a8,a9]
[a2,a3,a4,a5,a6,a7]   -> [a8,a9,a10]
[a3,a4,a5,a6,a7,a8]   -> [a9,a10,a11]
..................        ...........
..................        ...........

(I am using keras and tensorflow as the backend.)

The first layer has 50 neurons and expects 6 inputs; the hidden layer has 30 neurons; and the output layer has 3 neurons (i.e., outputs three time series values).

model = Sequential()
model.add(Dense(50, input_dim=6, activation='relu',kernel_regularizer=regularizers.l2(0.01)))
model.add(Dense(30, activation='relu',kernel_regularizer=regularizers.l2(0.01)))
model.add(Dense(3))
model.compile(loss='mean_squared_error', optimizer='adam')

model.fit(TrainX, TrainY, epochs=300, batch_size=16)

Is this a valid model? Am I missing something?

Best Answer

This seems reasonable---it is a rolling time window similar to this question. In terms of predicting 3 time steps instead of 1, it's fine---you can just output a vector, as you've done.

Better suited to this problem is a recurrent neural network however.

  • It handles long term dependencies (for the 6 time step window method, the model only sees those).
  • It handles variable-length sequences as input (what about predicting for time steps 3,4,5 with the window method?).
  • It fits the data. There is parameter sharing over time steps. These are time series data, and generally that means that the generative process has some parameter sharing. For example, consider a time series of blood pressures. To some extent, at each time step, there is some function that takes us from the last 3 time steps to the next one (s). Since the body generally tries to maintain blood pressure within some set of values, this function is usually probably reading previous values and trying to counteract trends (via the baroreceptor response or something). This function is relatively constant over time, however---the body and its mechanisms are fixed. So, it would make sense that the mapping from time step 3 to 4 is similar to the mapping from time step 4 to 5, and so on. This is captured by an RNN elegantly because the parameter taking us from the past to the future is shared over time steps. In your case, the model would need to estimate a single parameter (give or take a bias term). With an MLP, this is not the case. If you use a 6 dimensional vector as your input, the MLP will give each time step a different parameter. In some instances, it might find that the more recent parameters should be weighted more highly or some variation of this. This could be done with an RNN by just making an extra feature correspond to the "T-t", or it may even be possible without this. Either way, though, you can imagine that fitting one parameter to the data is (1) less costly and (2) less prone to overfitting. Essentially, the RNN parameter sees 6 times more data than each of the 6 MLP parameters.

This being said, the window approach sometimes performs better---although probably a correctly fit LSTM would always perform as good or better, it is easier to fit the window approach from the human modelers perspective. Also, sometimes CNNs are used for this kind of analysis and give comparable performance to RNNs, but they also have parameter sharing.