Solved – How are weights updated in the batch learning method in neural networks

machine learningneural networks

Can someone please tell me how I am supposed to build a neural network using the batch method?

I have read that, in batch mode, for all samples in the training set, we calculate the error, delta and thus delta weights for each neuron in the network and then instead of immediately updating the weights, we accumulate them, and then before starting the next epoch, we update the weights.

I also read somewhere that, the batch method is like the online method but with the difference being one only needs to sum the errors for all samples in the training set and then take the average of it and then use that for updating the weights just like one does in the online method (the difference is just that average) like this:

for epoch=1 to numberOfEpochs

   for all i samples in training set

         calculate the errors in output layer
         SumOfErrors += (d[i] - y[i])
   end

   errorAvg = SumOfErrors / number of Samples in training set

   now update the output layer with this error
   update all other previous layers

   go to the next epoch

end
  • Which one of these are truly the correct form of batch method?
  • In case of the first one, doesn't accumulating all the delta weights result in a huge number?

Best Answer

Using average or sum are equivalent, in the sense that there exist pairs of learning rates for which they produce the same update.

To confirm this, first recall the update rule:

$$\Delta w_{ij} = -\alpha \frac{\partial E}{\partial w_{ij}}$$

Then, let $\mu_E$ be the average error for a dataset of size $n$ over an epoch. The sum of error is then $n\mu_E$, and because $n$ doesn't depend on $w$, this holds:

$$\Delta w_{ij} = -\alpha \frac{\partial (n\mu)}{\partial w_{ij}}= -\alpha n\frac{\partial \mu}{\partial w_{ij}}$$

To your second question, the phrase "accumulating the delta weights" would imply that one of these methods retains weight updates. That isn't the case: Batch learning accumulates error. There's only one, single $\Delta w$ vector in a given epoch. (Your pseudocode code omits the step of updating the weights, after which one can discard $\Delta w$.)

Related Question