Solved – Matching loss function for tanh units in a neural net

loss-functionsneural networks

There's not much more I can add to the question. Googling has mostly turned up research papers on springerlink and other sites I don't have access to.

Given a neural network model with $tanh(x)$ as the output non-linearity, what is the appropriate matching loss function to use?

-Brian

Best Answer

The loss function is chosen according to the noise process assumed to contaminate the data, not the output layer activation function. The purpose of the output layer activation function is to apply whatever constraints ought to apply on the output of the model. There is a correspondance between loss function and activation function that can simplify the implementation of the model, but that is pretty much the only real benefit (c.f. link functions in Generalised Linear Models) as neural net people generally don't go in much for analysis of parameters etc. Note the tanh function is a scaled and translated version of the logistic sigmoidal function, so a modified logistic loss with recoded targets might be a good match from that perspective.

Related Question