Solved – Why do we need second tanh() in LSTM cell

lstmneural networks

My question is why do we apply tanh() function to C second time when it has already been applied to it during the update procedure or, in case if we didn't update it, in the previous LSTM cell. I mean essentially multiplication by gate matrix (which is sigmoid output) is just multiplication by 0 or 1 and if we want to keep more than one unit of information in memory cell, why apply tanh() after summation? Or we don't want to pass additional information through activation units?
LSTM cell

Best Answer

An issue with recurrent neural networks is potentially exploding gradients given the repeated back-propagation mechanism.

After the addition operator the absolute value of c(t) is potentially larger than 1. Passing it through a tanh operator ensures the values are scaled between -1 and 1 again, thus increasing stability during back-propagation over many timesteps.