Solved – In neural networks, how to compute the mean square error (MSE) in gradient update when using a minibatch

deep learninggradientgradient descentneural networkstorch

I've been using a siamese neural network for the binary classification of biological data.
I've implemented a Torch version of this algorithm, including a stochastic gradient update function.

At each iteration, this function reads one input profile and its corresponding target label (true/false), applies the back-propagation technique and finally generates one predicted value, that I will use in computing the confusion matrix. This means, I have 1 input profile, 1 target, and 1 output predicted value.

To check the performance of this gradient update function, I can compute the mean square error (MSE = (targetValue - predictedValue)^2 ). This is very useful.

Then I wanted to implement a mini-batch gradient update function. This is a function that reads N input profiles, and their corresponding N target labels (true/false). But, since my siamese neural network architecture has only 1 singular final output, it produces 1 singular output predicted value.

My problem is that, in this case, I cannot (or I don't know) how to compute the mean square error (MSE). I could do it if I had N output values, but since I only have 1 predicted value, what should I do?

Do you guys have any suggestions? How to compute the MSE error in minibatch gradient update?

Or am I doing something wrong?


My Torch code:

Gradient update for the siamese neural network:

function gradientUpdate(generalPerceptron, input_profile, targetValue, learningRate);

   function input_profile:size() return #input_profile end
   local predictionValue = generalPerceptron:forward(input_profile)[1];

    if predictionValue*targetValue < 1 then
      gradientWrtOutput = torch.Tensor({-targetValue});
      generalPerceptron:zeroGradParameters();
      generalPerceptron:backward(dataset_vector, gradientWrtOutput);    
      generalPerceptron:updateParameters(learningRate);
    end

  local meanSquareError = math.pow(targetValue - predictionValue,2);

  return generalPerceptron;
end

Minibatch gradient update for the siamese neural network:

function gradientUpdateMinibatch(generalPerceptron, input_vector, targetVector, learningRate)

   function input_vector:size() return #input_vector end
   local predictionValue = generalPerceptron:forward(input_vector)[1];

   local target_array_tensors = -targetVector;
   local gradientWrtOutput = torch.Tensor(target_array_tensors);
   generalPerceptron:zeroGradParameters();
   generalPerceptron:backward(dataset_vector, gradientWrtOutput);
   generalPerceptron:updateParameters(learningRate);

  return generalPerceptron;
end

Best Answer

If you want N output values instead of 1, you should implement generalPerceptron:forward() such that if it receives an NxM input matrix (i.e. N samples with M features), it outputs N values. I.e. it should perform a matrix multiplication between the input and the weights of the network.

EDIT: based on your comment. If you cannot modify forward(), you can just iterate over the input samples one at a time. I.e. you take one input sample, get a prediction using forward() on that input sample only, and use that to calculate the gradient. If you want to do minibatch with MSE, you would do something like this (pseudocode-ish):

For each minibatch:
    sumgrad = 0
    For each x_i in this minibatch:
       yhat = generalPerceptron:forward(x_i)
       error = 0.5*(target - yhat)^2 -- squared error
       sumgrad += gradient(error) --accumulate gradient
    weightupdate(learningRate, sumgrad/size(minibatch)) --update with learning rate and gradient average; not sure of the right Torch functions

I.e. you accumulate gradient over minibatch samples and use this for your weight update. A quick Google search results in this Torch tutorial code for minibatch learning.