Solved – More data, to counteract overfitting, results in worse validation accuracy

deep learningmachine learning

I am currently trying to classify clothes for my final project in school. My problem is that after I gathered more data, to counteract overfitting, the validation accuracy dropped from 60% to 45%. Below I explain in detail what I did. I use the following network layout:

Network Layout

I have five different clothing classes: T-Shirt, Pullover, Hoodie, Jeans and Shorts.

I first gathered data from Image-net.org. I had about 700 images per class. I then started training the network, resulting in the following graph:

First Training

Clearly there was overfitting happening so I gathered more data for the jeans and shorts:

Second Training

While the overfitting was still there, it started much later. However, the validation accuracy also got worse.

I then gathered more training data from google images. I now have around 1150 images per class:
Images per class
It resulted in the following training graph:

Third Training

Now the overfitting started to look much better. However, the validation accuracy got much worse!
What am I doing wrong here? Is there just not enough training data or is it something else?

Best Answer

Answer may be in your last graph. After you added new data, you stopped at 30 epoches, while your previous runs went out to 70 epoches. Notice the validation loss has not curved back up, and that tells you your training has not converged. Just run more epoches and see what happen.