I have a simple neural network and it works with the logistic function as activation function. Now I want to avoid the saturation problem by substituting the logistic function by the hyperbolic tangent:
#define SIGMOID(x) (1.7159*tanh(0.66666667*x))
#define DSIGMOID(S) (0.666666667/1.7159*(1.7159-(S))*(1.7159+(S)))
But the network never converges, the MSE stays the same throughout the training.
Here's my training samples:
double training_data[][4]={
{0, 0, 0, -1},
{0, 0, 1, 1},
{0, 1, 0, 1},
{0, 1, 1, -1},
{1, 0, 0, 1},
{1, 0, 1, -1},
{1, 1, 0, -1},
{1, 1, 1, 1}};
The network does converge if I use the original (non-scaled) hyperbolic tangent function, that is:
#define SIGMOID(x) (tanh(x))
#define DSIGMOID(S) (1-((S)*(S)))
Do I miss something? E.g. Scaling the output to match the range (-1.7159, 1.7159) or anything?
Best Answer
When I plot using the following R-code:
I get the following plot:
One of these is a sigmoid (red), one is not a great derivative (black). Notice the negative values. This is going to define a radius of convergence that shoots Newtons-methods toward infinity.
Now using this R-code:
I get this plot:
It is a more plausible graph of the derivative(black) for the sigmoid(red).
This was fun: link.
Edit:
Here are some basics on Tanh and friends.
Please notice in link 1 that the derivative of Hyperbolic Tangent is pow( hyperbolic_secant,2) and not pow( hyperbolic_cosine,2).