# Solved – the intuition of momentum term in the neural network back propagation

I am going through the derivation of neural network using this lecture pdf

And I am stuck on equation $(21)$

Note on notation:

1. Activation function of layer $j$ is $y_j$
2. Summation of weights of layer $j$ is $x_j$
3. final label is $t$

I am trying to figure out where

$$\eta \Delta w_{kj} (n-1)$$

is coming from in the final equation $(21)$

$$\Delta w_{kj}(n) = \alpha \delta_j y_k + \eta \Delta w_{kj} (n-1)$$

The author mentioned that it is a momentum term without really elaborating on it.

I thought $\Delta w_{kj}$ calculation is the following

$$\Delta w_{kj} = – \alpha \frac{\partial E}{\partial w_{kj}}$$

for 1 layer before final output layer:

$$\Delta w_{kj} = – \alpha (-(t_j-y_j))y_j(1-y_j)y_k$$

for all other layers:

$$\Delta w_{kj} = – \alpha (\delta_{i}w_{ji}) y_j(1-y_j)y_k$$

So what is the momentum term?

Can someone help me out ?

Notice that the further you are in time from an update, the less important it is for determining your current update, since, if you expand the momentum term at time $t$ then the contribution of the update from time $t-k$ is: $$\alpha^{k+1}\Delta\omega_{t-k}$$ Where $\alpha\in[0,1]$ is the momentum and $\Delta\omega_{t-k}$ is the weight update at time $t-k$