Solved – Is the interpretation of Neural Network results correct

classificationmeasurement errormodel-evaluationneural networkstext mining

I use a neural network with a topology of 17-30-1 (sigmoid, atan error function, mse as cost function, 5-fold cv) for text classification. (It's closely related to a previous question of mine.)

The input data is quite noisy thus I could live with not a "perfect" classification score, but the results I get are probably too bad (or even just random) and thus I ask for your opinion.

  1. The training error is around 0.06-0.09 (MSE), i.e. in average each classification differs approx. 0.25-0.3 from the predict label; in this binary case with a class threshold of 0.5 this might be acceptable. What do you think?

  2. The test error (MSE) is unfortunately around 0.20 sometimes even 0.25; i.e. the effective error for a test sample is around 0.5, which to me means that the network a) suffers from high variance and b) is just as good as random guessing.

I don't need a perfect classification, but the network should however represent the patterns of the input data. But with this results I think the neural network is more or less useless or rather the input features are crap.

Best Answer

First, I would advise you to not use squared error but the cross entropy error. Squared error results from the assumption that your labels are subject to Gaussian noise, which will probably not be the case.

First, the output of your network should be a softmax:

$z_k = \frac{\exp{y_k}}{\sum_i\exp{y_i}}$

This is basically a logistic regression layer on top of the neural network, and gives you a proper probability. You can train that with the cross entropy error function (see here for an explanation)--the derivatives stay the same as for squared error and linear outputs.

Regarding the interpretation of the results: this is data set specific. If it is a hard task, that looks good. However, you should look at the acutal number of correct classifications in the end and see if that is good enough for your application. Anyway I think you will get better results if you use cross entropy.