Solved – Which elements of a Neural Network can lead to overfitting

machine learningmathematical-statisticsneural networksoverfittingpredictive-models

I am very new to neural networks. I am running a simple ANN with Keras (TensorFlow backend) on a dataset with around 5000 observations and 4 features.

I am trying different parameters and such things in the neural network, and then plotting the train and test error. For instance I increased/decreased batch_size, I increased/decreased the number of epochs, I increased/decreased the amount of data I used, I increased/decreased the number of nodes in the first and only hidden layer, I increased/decreased the number of hidden layers and so on.

Plotting the training and test error suggests that some of them may lead to overfitting. So here is my question:

Which of these features of a NN COULD lead to overfitting? And which ones am I missing? [in general!]

  1. Increasing/Decreasing batch size.
  2. Increasing/Decreasing epoch size
  3. Increasing/Decreasing number of neurons in first hidden layer (or other layers).
  4. Increasing/Decreasing the number of hidden layers

My guess is that increasing the number of neurons and layers can lead to overfitting. Increasing the number of epochs doesn't, while increasing the batchsize could lead to overfitting.

Is this correct? Independently of the number of samples we have, which of these COULD lead to overfitting? And why?

Furthermore, which other features of a neural network could lead to overfitting?

Best Answer

Increasing the number of hidden units and/or layers may lead to overfitting because it will make it easier for the neural network to memorize the training set, that is to learn a function that perfectly separates the training set but that does not generalize to unseen data.

Regarding the batch size: combined with the learning rate the batch size determines how fast you learn (converge to a solution) usually bad choices of these parameters lead to slow learning or inability to converge to a solution, not overfitting.

The number of epochs is the number of times you iterate over the whole training set, as a result, if your network has a large capacity (a lot of hidden units and hidden layers) the longer you train for the more likely you are to overfit. To address this issue you can use early stopping which is when you train you neural network for as long as the error on an external validation set keeps decreasing instead of a fixed number of epochs.

In addition, to prevent overfitting overall you should use regularization some techniques include l1 or l2 regularization on the weights and/or dropout. It is better to have a neural network with more capacity than necessary and use regularization to prevent overfitting than trying to perfectly adjust the number of hidden units and layers.

Related Question