Solved – Loss during minibatch gradient descent

gradient descentneural networkstensorflow

I have minibatch gradient descent code in Tensorflow for function approximation, but I am unsure when to calculate the loss. First, I create batches for x and y data. Then, I shuffle both these batches every epoch. I work with 2 lists, train_batch_loss is used to store train losses for each batch, so I can then take the average over all batches and append it to the train_loss list.
My code looks like this

train_loss = []
for epoch in range(number_of_epochs):
    x_train_batches, y_train_batches = createBatches(batch_size, Xdata, Ydata)
    number_of_batches = len(x_train_batches)
    x_train_batches = shuffle(x_train_batches)
    y_train_batches = shuffle(y_train_batches)
    train_batch_loss = []
    for batch in range(number_of_batches):
        t = sess.run([training], feed_dict = {X:x_train_batches[batch],Y:y_train_batches[batch]})
        train_batch_loss.append(sess.run(cost_function, feed_dict = {X:x_train_batches[batch],Y:y_train_batches[batch]}))
    train_loss.append(sum(train_batch_loss)/len(train_batch_loss))

Is this the correct way to calculate losses for minibatch GD?

Best Answer

X and y data should be shuffled accordingly, so that the pairings are consistent in the minibatches (not evident in your code due the 2 separate shuffle calls). You might also want to compute the loss in an independent validation set (ie, never gets mixed with train batches across all epochs). I guess Xdata, Ydata in your code is only train data, right? Otherwise validation (and even worse, test) data would get totally mixed-up