Solved – Gridsearch vs Crossvalidation with Keras and Deep Learning

deep learningkerasmachine learningneural networksscikit learn

I have a recurrent LSTM network, which classifies text and I want to find the best hyperparameters. I use GridSearchCV from scikit-learn and a classifier, which uses generators to transform my sparse vectors to dense vectors.

Code from here

While I can execute the grid search, i have to define the number of epochs first. Furthermore the default implementation uses 3-fold cross validation. As scikit-learn does manage the split of my dataset I can only get the validation score after all epochs (intead of after each epoch).

My problem is that my network does converge fast to 90% accuracy in the most cases but varies from epoch to epoch in the end about 5%. So my grid search results are not really reliable (uses what is the score after x epochs).

Before I started with grid search I just used a train test split and compared parameters by using the maximum achieved accurancy on the validation data, which was scored after each epoch. In this picture you can see the loss over 20 epochs.

loss

My question is: Does it make sense to use grid search with cross validation here, or would you use the train/test split and determine the best hyperparameters, by looking at the best accuracy achieved after each epoch.

Best Answer

Using grid search and cross-validation is fine overall in this context, but you need to slightly tweak the procedure for your use case. You're right to be concerned about fixing the number of epochs ahead of time and only looking at the most recent, rather than the best, value of the loss.

To fix this, use a callback to stop training when test loss increases. This stops training the model when the loss increases, so you will have a loss which more-closely reflects the "best" model, not the model that was overfit.

https://keras.io/callbacks/#earlystopping