Solved – Robust softmax solutions for Theano

classificationneural networkspythontheano

I am implementing multilayer perceptrons with the softmax activation function over Theano. In some extreme cases I am running into problems with too high/low values in the softmax function that originate some distributions that are in some places equal to zero.

When computing the logarithm of these I get -inf and the error propagates through all the code.

My simple solution was adding a small constant to the distribution like this:

self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b) + 0.0000001

I already googled and found plenty of solutions that were more elegant than mine (and exact), but the nature of Theano demands something different since the log-likelihood will be symbollicaly differentiated to find the gradients for the algorithm.

Also, I found weird that this problem is not commonly adressed for neural networks, logistic regression or whatsoever. Are these kind of values so extreme that it actually indicates problems in another part of my system? Am I doing something wrong here or missing some point?


Update 1: Theano can give you some very different results depending on what mode tag you're using. Here I think I was using mode = FAST_COMPILE and apparently it deactivated the numerical optimizations and stabilizations for the function graphs done by the compiler. If you're doing this try changing it to mode = FAST_RUN


Update 2: This page lists some optimizations made by Theano including a specific one for softmax: local_log_softmax

Best Answer

Looks like you answered your own question. However, you should check how they implemented log-softmax. See my answer here for a numerically stable softmax function. Therefore, log softmax should be:

def log_softmax(q):
    max_q = max(0.0, np.max(q))
    rebased_q = q - max_q
    return rebased_q - np.logaddexp(-max_q, np.logaddexp.reduce(rebased_q))

As long as your inputs are finite, I don't think this can ever be infinite.

Related Question