MATLAB: Inexplicable GPU memory usage

gpu memory usage

I'd like to perform fft's on multiple arrays that are significantly smaller in size than the total amount of memory on my GPU (e.g. 1GB arrays on an 8GB RTX 2070).
When I load just one array onto the GPU and perform an FFT, reduction in free memory is disproportionately high relative to the size of the original array. I would expect this to be the case during the operation, but would also expect the memory to be freed up after the operation. This doesn't seem to be the case, however, and memory required to perform the FFT operation never gets freed after the operation. Consider the following example, where the free memory is tracked as 1) an array is placed on the GPU, 2) an in-place FFT is computed in the column dimension, 3) an iFFT is computed in place in the column dimension.
gpu = gpuDevice;
m(1) = gpu.FreeMemory; % Free memory on GPU
% 3D data array on GPU
A = gpuArray(randn(2048, 64, 2000, 'single'));
% Check free memory:
m(2) = gpu.FreeMemory
% Perform in-place fft:
A = fft(A);
% re-check free memory:
m(3) = gpu.FreeMemory
% Perform in-place ifft:
A = ifft(A);
% final check of free memory:
m(4) = gpu.FreeMemory
After execution I get the following values for m:
m =
1.0e+09 *
6.9215 5.8729 1.6472 0.5986
The first value seems fine, more or less (I never get more than ~7GB free on my 8GB card but I'm not going to complain), and the second value is fine as well. But the 3rd value, (~1.6GB of memory free) is very concerning, and indicates that after the operation more than 5GB of memory are "in use." I would expect 2GB because the size of A hasn't changed and the values are now complex, but not 5GB. Even if 5GB were required during the FFT operation, why don't I get 3GB back afterwards?
Then, to continue, after performing an inverse Fourier transform (which I threw in there just out of curiousity) it looks like a full 1GB are additionally sucked from the pool of free memory, never to return.
Is there a way to free up the memory that isn't apparently being actively used for variable storage on the GPU without resetting the GPU or clearing the variable (neither of which are good options), or even better, a way of avoiding the problem in the first place without taking a major performance hit? Matlab version is R2019a

Best Answer

MATLAB caches GPU memory and FFT plans etc. to make subsequent operations more efficient. This does mean that the FreeMemory property reflects this. The AvailableMemory property takes the caching into account, and tells you how much memory is available to use (i.e. it knows that the caches will be flushed automatically by MATLAB when necessary). See https://uk.mathworks.com/help/parallel-computing/parallel.gpu.gpudevice.html for more