Solved – What’s the difference between a single output RNN and a MLP whose input data contains all the features of given time steps

lstmneural networksrecurrent neural network

When a RNN/LSTM contains output in each time step, I can understand that the output of current time steps is a function of its historical data.

But when one deals with a RNN that only has an output of the last time step, what's the difference between it and a MLP whose first layer contains all features of all the time steps? e.g.(slide 15 of this)

For example, when dealing with single output machine learning problem, given time steps = 4 and 2 input features in each time step, what's difference between it and a 8 inputs MLP?

enter image description here

Note that, for convenience, I put all the neurons of the hidden layer into a single hidden layer box.

Any help will be appreciated.

Best Answer

The way I like to describe this is that with RNN your net "understands" that your inputs are the same thing (say frame of video) and applies similar transformation to all of them - for instance if you are inputting a video input at a given timestep is a frame of the video and is separate from the previous frames. If you input all of the frames together your net might develop connections among unrelated inputs that are close to each other in the input (for instance first pixels of one frame and last pixels of the previous ones).

Another great benefit of RNNs is that it requires less trainable connections and allows for variable length of the input (videos can be several seconds or hours long) while FNN will require some heavy data manipulation and will not perform well if most of the inputs are 0s