I am applying a first FFT2 on a stack of images, croping a part of it, and iFFT2 this part:
For example on GPU: FFT2(1920*1240*30 (single) ) -> crop to 320*207*30 (single) -> iFFT2(320*207*30 (single) )
1920/6=320
1240/6=207
Here you may observe the time of execution, normalized to the number of single data processed, for each function:
Note that the yellow line (FFT2+crop1/6+iFFT2) is more than an order of magnitude slower than the purple line which has 36 more data to process with iFFT2 !
Any idea on what is happening here?
Here is the script I have used:
clearn=10;cx=1920;cy=1240;FPT=2:5:50;fpt=size(FPT,2);b=zeros(1,fpt);for kk=1:8 for ii=1:fpt ii I=gpuArray(single(rand(cy,cx,FPT(1,ii)))); Ia=gpuArray(single(rand(round(cy/6),round(cx/6),FPT(1,ii))+1i.*rand(round(cy/6),round(cx/6),FPT(1,ii)))); mask=zeros(cy,cx,FPT(1,ii));mask(round(cy/2)-round(cy/12):round(cy/2)+round(cy/12),round(cx/2)-round(cx/12):round(cx/2)+round(cx/12))... =(ones(size(round(cy/2)-round(cy/12):round(cy/2)+round(cy/12),2),size(round(cx/2)-round(cx/12):round(cx/2)+round(cx/12),2)));mask=gpuArray(single(mask)); tic for jj=1:n switch kk case 1 tic B=fft2(I); case 2 tic B=fft2(I); C=B(((cy/2)-round(cy/12)):((cy/2)+round(cy/12)),... ((cx/2)-round(cx/12)):((cx/2)+round(cx/12)),:); case 3 tic B=fft2(I); C=B(((cy/2)-round(cy/12)):((cy/2)+round(cy/12)),... ((cx/2)-round(cx/12)):((cx/2)+round(cx/12)),:); D=ifft2(C); case 4 tic B=fft2(I); C=ifft2(B); case 5 tic B=fft2(I); C=B.*mask; D=ifft2(C); case 6 tic B=fft2(I); C=B.*mask; D=ifft2(C); E1=imresize(abs(D),1/6); E2=imresize(angle(D),1/6); case 7 tic C=fft2(I); B=ifft2(Ia); case 8 tic B=ifft2(Ia); end end b(1,ii)=toc/n; % b is the time of execution normalized to
%the amount of data and the number of time a case has been evaluated
end hold on plot(b) clear A B C D I E1 E2end
b is the variable plotted in the above graphic.
My graphic card is the GeForce RTX 2080 Ti.
Any help will be appreciated.
Thanks,
Tual
Best Answer