Solved – Division by zero with cross entropy cost function

neural networks

I am using a tanh as my activation function for my NN. I also was using the cross entropy cost function previously when I had sigmoid neurons. The sigmoid neurons can never make it to zero but a tanh can and when I train the NN I will get division by zero errors. I switched back to the quadratic cost function but it converges slowly. Is there a way to use the cross entropy cost with a tanh or is there something better I could use?

Best Answer

It's common to use softmax as a final layer. It helps you to convert the output values to the probabilities. If you use softmax as an activation function for the final layer you can use any function you like for the previous layers.