Solved – Robust softmax solutions for Theano

I am implementing multilayer perceptrons with the softmax activation function over Theano. In some extreme cases I am running into problems with too high/low values in the softmax function that originate some distributions that are in some places equal to zero.

When computing the logarithm of these I get -inf and the error propagates through all the code.

My simple solution was adding a small constant to the distribution like this:

self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b) + 0.0000001

I already googled and found plenty of solutions that were more elegant than mine (and exact), but the nature of Theano demands something different since the log-likelihood will be symbollicaly differentiated to find the gradients for the algorithm.

Also, I found weird that this problem is not commonly adressed for neural networks, logistic regression or whatsoever. Are these kind of values so extreme that it actually indicates problems in another part of my system? Am I doing something wrong here or missing some point?

Update 1: Theano can give you some very different results depending on what mode tag you're using. Here I think I was using mode = FAST_COMPILE and apparently it deactivated the numerical optimizations and stabilizations for the function graphs done by the compiler. If you're doing this try changing it to mode = FAST_RUN

Update 2: This page lists some optimizations made by Theano including a specific one for softmax: local_log_softmax

Solved – Robust softmax solutions for Theano

Best Answer

Related Question

Best Answer

Related Solutions

Solved – From the Perceptron rule to Gradient Descent: How are Perceptrons with a sigmoid activation function different from Logistic Regression

RNN – Using LSTM for Predicting Time Series Vectors in Theano

Related Question