Solved – How to use 1.7159 * tanh(2/3 * x) as activation function

neural networks

I have a simple neural network and it works with the logistic function as activation function. Now I want to avoid the saturation problem by substituting the logistic function by the hyperbolic tangent:

#define SIGMOID(x) (1.7159*tanh(0.66666667*x)) 
#define DSIGMOID(S) (0.666666667/1.7159*(1.7159-(S))*(1.7159+(S)))

But the network never converges, the MSE stays the same throughout the training.
Here's my training samples:

double training_data[][4]={
            {0, 0,  0,  -1},
            {0, 0,  1,  1},
            {0, 1,  0,  1},
            {0, 1,  1,  -1},
            {1, 0,  0,  1},
            {1, 0,  1,  -1},
            {1, 1,  0,  -1},
            {1, 1,  1,  1}};

The network does converge if I use the original (non-scaled) hyperbolic tangent function, that is:

#define SIGMOID(x) (tanh(x))
#define DSIGMOID(S) (1-((S)*(S)))

Do I miss something? E.g. Scaling the output to match the range (-1.7159, 1.7159) or anything?

Best Answer

When I plot using the following R-code:

x <- seq(from = -2, to = 2, by = 0.01 )
y <- (0.666666667/1.7159*(1.7159-(x))*(1.7159+(x)))
y2 <- (1.7159*tanh(0.66666667*x)) 

plot(x,y2,col = "red")
points(x,y)

I get the following plot: plot of the give expressions

One of these is a sigmoid (red), one is not a great derivative (black). Notice the negative values. This is going to define a radius of convergence that shoots Newtons-methods toward infinity.

Now using this R-code:

x <- seq(from = -2, to = 2, by = 0.01 )
y <- 1.14393*(1/cosh(2*x/3))^2
y2 <- (1.7159*tanh(0.66666667*x)) 

plot(x,y2,col = "red", type = "b")
points(x,y)

I get this plot: updated plot of expressions

It is a more plausible graph of the derivative(black) for the sigmoid(red).

This was fun: link.

Edit:

Here are some basics on Tanh and friends.

  1. http://mathworld.wolfram.com/HyperbolicTangent.html
  2. http://mathworld.wolfram.com/HyperbolicCosine.html
  3. http://mathworld.wolfram.com/HyperbolicSine.html

Please notice in link 1 that the derivative of Hyperbolic Tangent is pow( hyperbolic_secant,2) and not pow( hyperbolic_cosine,2).