Solved – In Neural Network back propagation, how are the weights for one training examples related to the weights for next training examples

backpropagationgradient descentmachine learningneural networks

In Simple Neural Network back propagation, we normally use one round of forward and back propagation in every iteration. Let's assume, we have one training example for any arbitrary dimensions, and some initial weights. Then using forward propagation, we calculate the predicted output. This predicted output is then used to calculate the total error which is the back propagated to Re-calculate the weights. After recalculating the weights for all the layers, we update the weights for all the layers all at once. It's not like first we update the weights of one layer and then the other, but instead we first recalculate the weights of all layers( layer by layer ) and then update all at once. We can conclude that "

Re-calculating of the weights layer by layer and then updating the
weights with recalculated weights all at one for all the layers".
Does this makes sense? Is it the right way of weight update using back propagation?

enter image description here
Now Let's assume, I have "m" examples instead of just one example. In case of "m" examples, each of these small gradient steps will be taken after one back propagation iteration over all examples "m".

I am confused that in case of "m" examples, this back propagation
works on these examples one by one. Like, it first takes the first
example and update the weights. Then it takes the second example and
calculate the weight again. then it takes the third example and
calculate the weight and so on. Then in the last when it has run over
all the examples, only then it takes the single step towards optimum
point. If that is the case, is there any relation between weights for
one example to the weights for another example?? As the BP is is
recalculating the weights for each examples in sequence?

Best Answer

A batch of data is taken for feed-forward and "Back-propagation" is performed on the number of examples in that batch. Wights and bias are updated on the basis of change of average error/batch. Then change in weights are updated in the previous wights before performing feed-forward on the next batch of data. A detailed explanation is given in the following book:

http://neuralnetworksanddeeplearning.com/chap2.html

Related Question