Solved – Confusion in Softmax function in CNN

computer visionconv-neural-networkdeep learningloss-functions

I have recently started working on binary classification using Convolution Neural Network (CNN). While training i am getting two outputs a Binary error (BinErr) and the output of the loss function (Objective) of softmax classifier which is converging to zero.
enter image description here

Studying about the softmax classifier first, I understand that in the machine learning we have to minimize the objective function w.r.t to the parameters (weights and bias). The Objective function consists of loss function and regularization function. I found these slides for the understanding of softmax.

The Loss function of the Softmax classifier is defined as:
\begin{equation}
p_j = \frac{e^{o_j}}{\sum_k e^{o_k}}
\end{equation}

I couldn't understand after that as in the slides $o$ is shown that it is the output of the final layer of the NN and first input to softmax classifier and the second input $y$ is the actual features.

I am totally confused how it works about that. Anyone can help me to understand how softmax works in the CNN.

Best Answer

The formula you've written is not a loss function; it's just the formula for softmax.

The softmax activation is normally applied to the very last layer in a neural net, instead of using ReLU, sigmoid, tanh, or another activation function. The reason why softmax is useful is because it converts the output of the last layer in your neural network into what is essentially a probability distribution. If you look at the origins of the cross-entropy loss function in information theory, you will know that it "expects" two probability distributions as input. That's why softmax output with cross entropy loss is very common.

Just to reiterate; softmax is typically viewed as an activation function, like sigmoid or ReLU. Softmax is NOT a loss function, but is used to make the output of a neural net more "compatible" with the cross entropy or negative log likelihood loss functions.