MATLAB: Does copying from gpu to host generally take longer than copying from host to gpu

Parallel Computing Toolbox

Hi,
I am using the parallel computing toolbox, and it seems like copying data to a gpu using gpuArray() generally takes much longer than copying data back to the host using gather(). For example, if I try:
A = rand(500,500, 50);
tic,
B=gpuArray(A);
toc,
tic,
gather(B);
toc,
Then the gather() takes about 0.055 seconds while the gpuArray() takes only 0.018 seconds. Is this behavior expected? Am I using the wrong method to time this?
Thanks.

Best Answer

I think you have arrived at the same conclusion as this blog post on GPU performance. It provides a lot of detail on how to properly benchmark GPU operations in MATLAB.
Related Question