I'm trying out the cross entropy loss function for neural network training, per the arguments at https://jamesmccaffrey.wordpress.com/2013/11/05/why-you-should-use-cross-entropy-error-instead-of-classification-error-or-mean-squared-error-for-neural-network-classifier-training/ as to why it's better than mean squared error.
However, I'm getting division by zero errors leading to infinite weights. Looking at the formula for it e.g. as implemented in the tiny-cnn library,
class cross_entropy_multiclass {
public:
static float_t f(float_t y, float_t t) {
return -t * std::log(y);
}
static float_t df(float_t y, float_t t) {
return -t / y;
}
};
in one sense this is not surprising, as it will give division by zero every time the current output of a neuron, y
, happens to be zero.
In another sense it is surprising; if this were a known problem with cross entropy loss, I would expect it to be mentioned in some of the discussion I looked at.
Am I doing something wrong, or is there some sort of bug in tiny-cnn, or what else am I missing?
Best Answer
Your output neuron should be a sigmoid function (which ensures your values are between 0 and 1 exclusive)