Solved – Is it always possible to achieve perfect accuracy on a small dataset

accuracyneural networksoverfittingtrain

I have read many times that a good debugging step while building a machine learning model is to try to overfit your model to a very small subset of your data. [Here is one such instance][1].

Provided your code is bug free, is it always possible to achieve perfect or near-perfect performance on the training set when you do this? Could you do it even on a small dataset of random numbers?

I have a model that is achieving significantly better accuracy on my actual data than it does if I feed it random numbers, but its far from perfect, and it seems no matter how small I make the dataset, how many layers I use, or how big I make the layers, the accuracy stays about the same. What could cause this?

UPDATE:
Thanks to folks who responded, I understand that it should always be possible to fit a small subset of your data, so I took another look at my implementation.

It turned out there were several small issues. Switching from random uniform weight initialization to xavier initialization provided a significant bump in my results (I assumed this would only improve the speed at which training would converge to the same crappy result, but it actually improved the accuracy overall). I also did not have fully normalized data. Everything was in a range from 0 to ~10, which I initially thought should be good enough, but I got another big bump in performance when I normalized to -1 to 1. A third problem I had was with my validation set. My data is in several different sets from different sources, and it turned out there were distinct "styles" or trends to each set. I was training on a majority of the datasets, and evaluating on one particular set. When I shuffled all the individual examples together from all sets, and then drew my validation set randomly from the complete shuffled set, I started seeing accuracies in the mid and upper 90s!

Best Answer

In theory, there is nothing a neural network can't approximate. In fact, you only need a single hidden layer!

https://en.wikipedia.org/wiki/Universal_approximation_theorem

So the answer is definitely YES, it's always possible to achieve perfect or near-perfect performance for any training data set. No matter how small or how big.

Have you:

  • Actually looked at the training examples that your network failed? Maybe there's a pattern?

It's no good randomly adjust neural network paramters. You said your new dataset is small, why don't you work a bit harder, and pull the training sets that your network can't predict/classify? There should be a reason, maybe you have a bug? Maybe those examples are just non-sense or outliers?? Maybe they are just random noise that your network shouldn't work on it anyway? Maybe you just need more iterations?

Please look at those failed examples, don't guess. Machine learning is more than just running the same thing again and again.