I am currently trying to classify clothes for my final project in school. My problem is that after I gathered more data, to counteract overfitting, the validation accuracy dropped from 60% to 45%. Below I explain in detail what I did. I use the following network layout:
I have five different clothing classes: T-Shirt, Pullover, Hoodie, Jeans and Shorts.
I first gathered data from Image-net.org. I had about 700 images per class. I then started training the network, resulting in the following graph:
Clearly there was overfitting happening so I gathered more data for the jeans and shorts:
While the overfitting was still there, it started much later. However, the validation accuracy also got worse.
I then gathered more training data from google images. I now have around 1150 images per class:
It resulted in the following training graph:
Now the overfitting started to look much better. However, the validation accuracy got much worse!
What am I doing wrong here? Is there just not enough training data or is it something else?
Best Answer
Answer may be in your last graph. After you added new data, you stopped at 30 epoches, while your previous runs went out to 70 epoches. Notice the validation loss has not curved back up, and that tells you your training has not converged. Just run more epoches and see what happen.