Neural Network Training – Can a Neural Network be Trained with Fewer Samples than Weights?

neural networksoverfittingunderdetermined

First of all: I know, there is no general number of sample size required to train a neural network. It depends on way too many factors like complexity of the task, noise in the data and so on. And the more training samples I have, the better will be my network.

But I was wondering: Is it theoretically possible to train a neural network with less training samples than weights, if I assume my task to be "simple" enough? Does anybody know an example where this worked out?
Or will this network almost surely perform poor?

If I consider, for example, polynomial regression, I can't fit a polynomial of degree 4 (i.e. with 5 free parameters) on only 4 data points.
Is there a similar rule for neural networks, considering my number of weights as the number of free parameters?

Best Answer

People do that all the time with large networks. For example, the famous AlexNet network has about 60 million parameters, while the ImageNet ILSVRC it was originally trained on has only 1.2 million images.

The reason you don't fit a 5-parameter polynomial to 4 data points is that it can always find a function that exactly fits your data points, but does nonsensical things elsewhere. Well, as was noted recently, AlexNet and similar networks can fit arbitrary random labels applied to ImageNet and simply memorize them all, presumably because they have so many more parameters than training points. But something about the priors of the network combined with the stochastic gradient descent optimization process means that, in practice, these models can still generalize to new data points well when you give them real labels. We still don't really understand why that happens.

Related Question