Unlike other matrix operations, it seems that circshift runs slower on GPU than on CPU.
Run the following code, GPU spends twice the time of CPU. Any method to improve it?
M = rand(512);N = gpuArray(rand(512));tic;for i = 1:10 circshift(M, [10, 10]);endtoc;tic;for i = 1:10 circshift(N, [10, 10]);endtoc;
Best Answer