Solved – 100% accuracy on training, high accuracy on testing as well. What does this mean

accuracycomputer visionconv-neural-networkneural networks

I was training a model to classify different traffic signs and decided to use a pre-trained alexnet model and redefining the last fully-connected layer to match the classes of the dataset. When I did some training it quickly approached near zero loss and when I evaluated it on the training set it gave me 100% accuracy. Naturally this led me to believe that the model had overfitted and wouldn't generalize well. However when I evaluated it on testing data it had ~97% accuracy.

The dataset I used was the GTSRB dataset so I don't think the any leaky data between the training and testing set, and I even took a picture myself and it classified it correctly. I've always been told that 100% accuracy with tiny loss is a bad sign but it seems to work well despite this. I know this is probably a good problem to have but I'm curious, what could be the reason for this?

Best Answer

I'll caveat this with "I don't know anything about the particular image dataset you're talking about", but with that in mind:

In general, if you have 100% train accuracy, you've probably massively overfit. If you however have 97% test accuracy, and you've done your cross-validation correctly, this in theory means that if you got some more data in, you would be able to classify it 97% correctly, which is most likely a very good thing, depending on the application (there are exceptions, such as when your data is very imbalanced, 97% accuracy might not be sufficient information, you might care about precision/recall as well).

There are many ways to screw up your cross-validation, but most of them apply to time ordered data, which is unlikely to be a problem for image recognition. There are still ways in which you could mess up. Imagine for example, you took your dataset and duplicated every single image, and then did a random train-test split. In that case, if you overfit intentionally, to the point where your neural net was just "remembering", you would get very high test accuracy as well (because most examples in your test set would exist in your training set too), but your model wouldn't necessarily generalise to new datasets, it could be terrible and you wouldn't know.

So, as I said, this is a good thing, provided you haven't "screwed up your cross-validation". It doesn't sound like you have, but there are too many ways you can mess up for me to list them all, I still frequently see new ones after 4 years of working as a data scientist.

Finally, just because you get a "good" score of 97%, does not mean that this is the best you could be doing. Maybe this is a really easy data set with a very clear signal in it. The fact that there's a gap between your test and train loss does suggest you have "over fit a little bit". You could try training for fewer epochs, or using regularisation or dropout to mitigate over-fitting, potentially sacrificing training loss and improving your test loss.