MATLAB: Custom mini-batches when using ‘trainnetwork’ – Accuracy/Loss much noisier

neural networkneural networkstrainnetwork

Hi,
I'm using the "trainnetwork" function to train a neural net on numeric data for classification. However, I have to input custom mini-batches rather than give it my entire dataset matrix as it is too big and will crash out. I have code that will randomly select a unique mini-batch from my data set. I set the parameters of the trainnetwork so that the 'MiniBatchSize' parameter is actually the size of the mini-batch matrix I'm feeding it and the 'MaxEpochs' to 1 so that it just runs one batch one time only and moves on. Therefore, I re-train my net on each batch in an iterative loop. I have the idea coded below.
The get_MiniBatch function below is only for illustrative purposes and the last column of miniBatch are the labels.
for epochIdx = 1 : maxNumEpochs
for miniBatchIdx = 1 : NumMiniBatches
miniBatch = get_MiniBatch(DATA);
options = trainingOptions('adam', 'MiniBatchSize', size(miniBatch,1), 'MaxEpochs', 1, 'Verbose', 0);
[Net, trainingMetrics] = trainNetwork(miniBatch(:,1:end-1), categorical(miniBatch(:,end)), layers, options);
layers = Net.Layers;
end
end
However, I noticed that when I plot the trainingMetrics.TrainingLoss and trainingMetrics.TrainingAccuracy across all mini-batches, they are much noisier compared to if I give the "trainnetwork" function all my data and allow it to run through mini-bacthes automatically and plot the in-built progress plots showing Training Accuracy and Loss (unsmoothed). Am I updating the weights etc. correctly just by assigning the layers to be updated at the end of my loop each time as I have here? Or should I also be updating other paramters?
I'm sure it would be much easier just to give "trainnetwork" all the data and allow it to do everything but I have to do the mini-batches in loops to reduce computational cost for now.
Thanks.

Best Answer

Hi Terence,
This is the expected behavior. Training on a single batch with all the data gives a smoother curve. It is not required to train on one batch as mini-batches can also train the network very effectively. Some advantages of using minibatch as opposed to the full dataset are that using the full dataset is more computationally expensive and the gradient trajectory can land you in a saddle point.
You can check the following links to know more about this.