Solved – Why do Deep Learning libraries force the cost function to output a scalar

backpropagationmachine learningneural networkstheano

Let's say we have a neural net with:

  • 5 input neurons
  • some arbitrary amount of hidden layers
  • 3 output neurons

Let's say we're using minibatches of size 32. So,

  • if we input a 5×32 matrix into the neural net,
  • we will then get out a 3×32 matrix of output activations.

Assume we are using a simple MSE loss function. We take the difference of our 3×32 target matrix and 3×32 output matrix, and do elementwise squaring of each entry in the resulting new 3×32 matrix, which we'll call M.

What I'm confused about, is that I've seen lots of code in Theano and other deep learning libraries that takes the mean of this matrix for its cost function (i.e. outputs a scalar–not a vector).

For example, T.mean(T.pow(T-Y, 2)) . Why is this? In my example, shouldn't Theano backprop M, not a scalar value? Even when I'm using minibatches, are these libraries just backpropping scalars?

Best Answer

All machine learning is about minimizing cost of some model. The most elementary thing when you try to find minimum value is ability to compare two values. You can do it only with scalar values. For example, given two vectors [0,2], [2,2] how would you compare those tuples? You have to define some norm function. Euclidean, max, Manhattan, or your own fancy one. Whatever you use it must output scalar values.

Related Question