Solved – Neural Networks – Epochs with 10-fold Cross Validation – doing something wrong

cross-validationneural networks

I am using a Neural Network (ResNet-18) to classify sounds from the UrbanSound8K dataset (https://urbansounddataset.weebly.com/)

As recommended by the dataset creators, I am using 10-fold cross validation using the pre-prepared folds by the creators.

With a neural network, I am also using epochs to train. Each epoch has 10-fold cross validation training (9 folds training, 1 fold validation)

The loss is the categorical cross-entropy.I collect the following stats:

  • Per epoch average train loss

  • per epoch average train accuracy

  • per epoch average valid accuracy

  • per fold train loss (for example, fold #55 is the 5th fold of the 5th epoch, with 10 folds in each epoch)

  • per fold train accuracy

  • per fold validation accuracy

The validation accuracy (per-fold and per-epoch) reaches close to 100% very quickly, within 9 epochs of 10-fold validation in each epoch.

I use all of the data in each fold, for the training and validation processing.

My questions are:

  • Is there something wrong with my approach?

  • Is it correct to use epochs with k-fold cross validation using all data in each fold, while training neural networks?

  • Could it be said that the weights are able to 'remember' the data in between epochs, which is why the network learns so quickly when epochs are used? In consequence, is the approach overfitting the data, given that all data in all folds are being used?

  • Instead of using full dataset, is it better to use mini-batch samples from the pool of 9-folds to train, reporting validation accuracy on the full dataset of the 10th validation fold, and then reporting average of the validation accuracy in each epoch? (and perhaps, per-fold and per-epoch training loss as well).

  • However, is the mini-batch approach over many epochs just a slower way than training using the full dataset in all training folds, and eventually lead to the same probably overfitting results?

Please see the tensorboard graphs below for the trends of these stats:

  1. Validation Accuracy per epoch (total 9 epochs, 99.78%)

enter image description here

  1. Validation accuracy per fold (reaches 100% in the 51st fold)
    enter image description here

  2. Train accuracy per epoch (reaches 100%)
    enter image description here

  3. Train accuracy per fold (reaches 100%)
    enter image description here

  4. Train loss per epoch:
    enter image description here

  5. Train Loss Per Fold:
    enter image description here

Best Answer

Your approach is incorrect. When you train your model, you NEVER allow your validation fold to become one of the training folds. As an example, let's say that you will have your model train for 30 epochs. You select your 9-folds that will be used for training and you select your 1-fold to be used for validation. Now train your model for all 30 epochs and DO NOT allow any of your training and test folds to interchange after any individual epoch. If you do that, your model is no longer valid. It will have seen all of the data during training, which means it will eventually get to 100% accuracy given enough epochs.

After these 30 epochs, now you select a new 9-fold combination and a new validation fold and repeat the process!