Solved – MSE and different types of activation functions in NN

mseneural networks

Lets say I have 3 neurons in the last layer of my neural network and I am using mean squared error as a loss function. The desired output of my neural network is a vector: [false,true,false]

If an activation function of those neurons is logistic sigmoid, they produce an output vector with a values between 0 and 1, for example: [0.05, 0.80, 0.15].

So, I encode false as 0 and true as 1, and I can calculate the loss like this:

$$
(0 – 0.05)^2 + (1 – 0.80)^2 + (0 – 0.15)^2 = 0.065
$$

Now, let's say an activation function of the last layer of my neural network is a hyperbolic tangent, so it produces an output vector between -1 and 1:
[-0.95, 0.85, -0.75], so I encode false as -1 and true as 1 and my calculations for the loss look like this:

$$
(-1 + 0.95)^2 + (1 – 0.85)^2 + (-1 + 0.75)^2 = 0.0875
$$

That's all make sense for me, the question I have is how do I encode true and false values if the output of the last layer of my network does not have upper or lower bound ? i.e. if I am using ReLU as an activation function ?

Best Answer

The standard1 way to perform classification with neural networks is to use sigmoid activation function and binary cross-entropy loss for single binary output, and linear activation followed by exponential normalization (softmax) and multinomial cross-entropy for one-hot binary output. There are good reasons why people use these.

If you want to use ReLU and mean squared error, you have to be prepared that things are probably not going to work optimally. But in principle it does not matter how you encode your values, your network should learn to predict them. It could be 0 and 1, or 0 and 34, or anything else. You cannot, however, expect the outputs of the network to be bounded within this range for arbitrary input. If that is among your requirements, use bounded activation function.


1 Understand as meaningful, accepted, working.