Solved – Does the activation function of output layer differ during training and already trained network

neural networks

I'm creating and OCR app, and so far it seams to work. It's quite similar to example from Coursera – Machine learning course.

Output layer of network has as many neurons as classes needed to recognize, 10 digits, 26 letter and alike. When the network is trained, it should basically output [1,0,0…0] for letter A, [0,1,0,…0] for letter B and so on.

In the description of hidden layers (during Coursera lessons), it is said that they use sigmoid function as activation function, and I'm not sure if output layer also uses sigmoid function.

So at the activation point, what differs when I train network, and when it is already trained?

Is it the same activation function except I should be turning largest output value into 1 and rest to 0, or there is something I'm missing here?

Explanation or links would be equally helpful. 🙂

Best Answer

JunJun yes that is correct - everything stays the same between training and test - you just take maximum to determine the classification. However, rather than a sigmoid on each class output, you might consider (training/testing) with softmax function on output layer: which is giving you 'true' probability in each class ( outputs sum to 1) Softmax