Solved – Cross entropy loss function and division by zero

entropyneural networks

I'm trying out the cross entropy loss function for neural network training, per the arguments at https://jamesmccaffrey.wordpress.com/2013/11/05/why-you-should-use-cross-entropy-error-instead-of-classification-error-or-mean-squared-error-for-neural-network-classifier-training/ as to why it's better than mean squared error.

However, I'm getting division by zero errors leading to infinite weights. Looking at the formula for it e.g. as implemented in the tiny-cnn library,

class cross_entropy_multiclass {
public:
    static float_t f(float_t y, float_t t) {
        return -t * std::log(y);
    }

    static float_t df(float_t y, float_t t) {
        return -t / y;
    }
};

in one sense this is not surprising, as it will give division by zero every time the current output of a neuron, y, happens to be zero.

In another sense it is surprising; if this were a known problem with cross entropy loss, I would expect it to be mentioned in some of the discussion I looked at.

Am I doing something wrong, or is there some sort of bug in tiny-cnn, or what else am I missing?

Best Answer

Your output neuron should be a sigmoid function (which ensures your values are between 0 and 1 exclusive)