Hi there.
I need your confimation or rejection for this question…
In following code, if the minibatch size is h,
[grad,loss] = dlfeval(@modelGradients,dlnet,dlX_miniBatch,Y_miniBatch);
the grad is the average of gradients of loss over h samples? Does it calculate dradients automatically and at the end with:
grad = 1/h * sum_i=1:h (\nabla loss(y_i,yHat_i)) ??
Following this question, for computing the total loss and geadient (for a full batch), does we should take avarage of losses and averages of gradients (averaging with the number of batches, say 1000 batches each with h size)??
Best Answer