Solved – Cross-validation in neural networks

cross-validationneural networks

Let's say you want to use cross-validation to get the best value for a hyperparameter in a neural network. The network has weights as well, all of which must be learned, but it's impractical to cross-validate the values of the parameters (there are too many). Let's say the hyperparameter of interest is the number of units in a one-layer feed-forward neural network.

You can divide the training set into 10 equally-sized pieces, with balanced classes (assuming this is a classification problem).

Here's what I think the general process would be:

For each value of hyperparameter:
   For k in 1:10:
     make the training set and the validation set
     learn the weights of the network; 
     (stop learning when validation loss stops decreasing)
     save the best weight setting and the validation loss
   Calculate the average validation loss over all folds
Save the model with lowest average validation loss

Open the final best model
Run the entire training set through it and the test set.

Am I thinking about this the right way? Is there anything wrong with my process?

Best Answer

Your pseudocode looks right to me. It's also possible to use cross-validation at the top level as well, giving you a double loop. I.e. in the outer loop you create train/test sets from folds, then in the inner loop you further break the train set down into train/validate portions. I would only recommend this if your dataset if very small. It would reduce the variance in the estimate of final best model performance, at the cost of 10x running time.

Some CV implementations use heuristics to choose the model. Instead of taking the one with the lowest validation loss, they take the one with the lowest complexity (in some sense, such as the number of hidden nodes), that is within 1 or 2 SE of the best.

Best Answer

Related Solutions

Solved – Final Model Prediction using K-Fold Cross-Validation and Machine Learning Methods

Neural Networks – Understanding Early Stopping in Neural Networks and Its Implications with Cross-Validation

Related Question