Solved – Predicting CPU and GPU memory requirements of DNN training

deep learning

Say I have some deep learning model architecture, as well as a chosen mini-batch size. How do I derive from these the expected memory requirements for training that model?

As an example, consider a (non-recurrent) model with input of dimension 1000, 4 fully-connected hidden layers of dimension 100, and an additional output layer of dimension 10. The mini-batch size is 256 examples. How does one determine the approximate memory (RAM) footprint of the training process on the CPU and on the GPU?
If it makes any difference, lets assume the model is trained on a GPU with TensorFlow (thus using cuDNN).

Best Answer

The answer of @ik_vision describes how to estimate the memory space needed for storing the weights, but you also need to store the intermediate activations, and especially for convolutional networks working with 3D data, this is the main part of the memory needed.

To analyze your example:

  1. Input needs 1000 elements
  2. After layers 1-4 layer you have 100 elements, 400 in total
    1. After final layer you have 10 elements

In total for 1 sample you need 1410 elements for the forward pass. Except for the input, you also need a gradient information about each of them for backward pass, that is 410 more, totaling 1820 elements per sample. Multiply by the batch size to get 465 920.

I said "elements", because the size required per element depends on the data type used. For single precision float32 it is 4B and the total memory needed to store the data blobs will be around 1.8MB.