Hi,
I have an application in which i need to compute 3D and 4D Convolutions which I have implemented using various methods and combinations of FFTs and linear convolutions. The problem I'm facing is, particularly when using my GPU for these computations, the estimation of the temporary memory requirements.
Basically my lowest-bound estimate for the memory requirements is based on the size I need to preallocate before the computation and my output. This is clearly not enough, seeing as I get Cuda out-of-memory exception a lot sooner than my estimate suggests.
My question therefore is: How much memory does a general convn or fftn operation require? Here is exemplary code of such cases:
padding = [Ydim, Xdim, Zdim];fftn_out = ifftn(fftn(M,padding).*fftn(P(:,:,1),padding).*fftn(K,padding));
or using convn
result = convn(convn(M,P(:,:,1),'same'),K,'full');
In the case using FFTs I clearly know the size of my output and the padded M,P and K input*. But how much temporary memory is necessary to actually compute output fftn_out? My first would be that I also have to consider storing the output of the 3 fftns padded to the padding vector using 2*double precision for the real and imaginary parts. But even then I don't know the temporary requirements of the fftn calculation itself and I also don't know when this memory is cleared internally.
The same basic question arise using convn.
Any help would be greatly appreciated.
*these are stored additionally to the non-padded M,P,K arrays I suppose. Would it therefore make sense to pre-pad M,P,K using the padding vector and clear the original M,P,K before doing the fftn Multiplication on the pre-padded arrays?
Best Answer