Solved – Neural net cost function for Hyperbolic Tangent activation

neural networks

In Andrew Ng's online machine learning course in the part about neural nets for classification the following convex cost function is given:
$$\text{cost} = -y\log(h_0(x)) – (1-y)\log(1-h_0(x))$$
which is predicated on the output labels being either 1 or 0 (a sigmoid activation function). I would like to code a NN classifier using a hyperbolic tangent activation function, particularly that given in Le Cun 1998, shown below:
$$1.7159\tanh((2/3)x)$$
which will take on values between >1 and < 0 in the limit.

Obviously the above cost function would not be suitable for this activation function, so could anyone let me know what sort of similar cost function for this hyperbolic tangent activation I should be looking for? So far my web search has been fruitless.

Best Answer

The cost function used with the sigmoid function was motivated by the maximum likelihood estimation, and $$\text{cost}=−y\log(h_0(x))−(1−y)\log(1−h_0(x))$$ is just another way of saying $$\text{cost}=−log(h_0(x))$$ when $y=1$ and $$\text{cost}=−\log(1−h_0(x))$$ when $y=0$.

Those motivations still exist no matter what the activation function is (sigmoid or hyperbolic tangent). I would map the hyperbolic function from the range (-1,1) to the same range as the sigmoid (0,1) so that:

$$\text{cost} = −\frac{y+1}{2} \log{\left(\frac{h_\theta(x)+1}{2}\right)}−(1− \frac{y+1}{2})\log\left(1−\frac{h_\theta(x)+1}{2}\right)$$

Where $$h_\theta = \tanh\left(\frac{2}{3}x \right)$$.

This will have a different gradient than the sigmoid. Good luck.

Related Question