MATLAB: Any way to configure GPU-based FFT library (fft/ifft) to free memory after call

errorfftgpuMATLABmemoryParallel Computing Toolbox

So, I'm currently trying to optimize a function that heavily uses FFT, and naturally the first thing I tried was packing most of the computations on the GPU. However, I noticed that somewhere in the middle of the algorithm the memory allocation on the GPU is much higher than warranted by the gpuArrays I have in the workspace (not in a sub function, and no persistent gpuArrays either).
I should add that some of the arrays being processed by the FFT are fairly large (e.g. a 278-by-68-by-32-by-56 complex double array). The code to simulate the problem lies becomes apparent when executing this:
% create large complex double array on the GPU (12GB card)
r = complex(gpuArray.randn([278, 68, 32, 56]), gpuArray.randn([278, 68, 32, 56]));
% this already allocates more than I would assume from (16 * numel(r))…?
% run an inverse FFT over dim 1
ri = ifft(r, [], 1);
% this then allocates a whopping additional 4GB of GPU memory — WTF?
% trying to free memory
clear ri;
% no GPU memory freed
clear r;
% about 1.2 GB freed
When I began to debug this problem, I found the following, using this simple line
fft(complex(gpuArray.randn([4, 1]), gpuArray.randn([4,1])));
frees almost all of the additionally (superfluously?) allocated memory, and my strong hunch is that this is some internal buffer of the FFT library.
So, the question is: can I–or even should I–(safely) add this to my code in strategic places to free up GPU memory (for other, non-FFT related functions!). I'm asking in particular because I want to run two or three instances of MATLAB on the same machine (same GPU) but otherwise always run into out-of-memory (unexpected) errors when making calls to the FFT library functions…
Thanks a lot in advance! /jochen

Best Answer

What version of MATLAB are we talking about here? And how are you determining whether memory was freed or not?
R2014a and earlier report 'FreeMemory' as a property of gpuDevice. This just tells you how much memory on the device isn't assigned to MATLAB. In R2014b we changed this to 'AvailableMemory', this tells you how much space there is on device plus how much space there is left in the memory pool. You see, MATLAB maintains a memory pool, so as you allocate more and more memory on the device, it won't be released, it'll be kept in the pool. So you won't see a reduction in 'FreeMemory' when you clear variables. Unless you force it to be released by calling reset(gpuDevice).
The purpose is to help keep things fast - if you're going to be doing more inverse FFTs on arrays this size, for instance, you'll get a speed-up by not having to wait for the memory to be allocated again.
The problem is if you want to use multiple instances of MATLAB to access the same GPUDevice. One MATLAB may end up hogging all the memory, so there isn't any available to the other MATLABs. You can get help with this using an advanced feature. Restrict the allowed size of your memory pool using this code:
>> feature('GpuAllocPoolSizeKb', X);
where X is the desired pool size in Kb. You can set it to 0 to switch off the pool. Use -1 to return it to the default.
This doesn't completely solve the problem. MATLAB can only free GPU memory while it's performing other memory operations. So, ironically, even with a pool size of zero you may find that memory is only freed up when you create a new array. This gives PCT the opportunity to clean up the memory being held by out-of-scope gpuArrays. To get round this you may find it useful to create the odd dummy gpuArray of small size.
None of this may really help you, depending on what you're trying to do. Ultimately, the FFT and IFFT do need that much working memory while they're computing.
Fortunately, your problem is easy to break down, since you have ND arrays. Just divide your input matrices into pieces.
Edit: To get good numbers from gpuDevice.AvailableMemory you should also try changing the allowed size of the FFT plan cache:
>> feature('GpuFftMaxStoredSizeKb', 0);
It turns out that for your unusual problem size, the estimate of the plan size that AvailableMemory uses is no good, and the plan size is unusually large (1.5Gb or so). So AvailableMemory doesn't realise that you have additional memory available. Don't worry - this would become available as soon as you tried to allocate, it's just the reporting that's wrong.