I am using GPU Coder and am concerned about CPU/GPU data transfer affecting performance. Suppose I have two MATLAB functions with the 'coder.gpu.kernelfun' pragma at the top of each, and I do something with the data between calling them:
A = half(data);B = kernelfun1(A); % output is B
% do something with B here
C = kernelfun2(B); % input is B
Does the data remain on the GPU the whole time as a half-precision float, or does it get copied to the CPU during the "do something with B" part?
Best Answer