Can anybody explain why these codes have drastically different runtimes?
I have a shared setup routine
clear ally = gpuArray.rand(1000, 1000, 'single');W = cell(1, 5);WFull = gpuArray.zeros(1000, 1000, 5);for j = 1:5 W{j} = gpuArray.rand(1000, 1000, 'single'); WFull(:,:,j) = W{j};end
Version 1 (finishes in 1.4 seconds on my machine)
z = gpuArray.zeros(1000, 1000, 5);ticfor i = 1:1000 for j = 1:size(W) z(:,:,j) = W{j}*y; endendtoc
vs. Version 2 (finishes in 39 seconds on my machine… 27x times slower)
z = gpuArray.zeros(1000, 1000, 5);ticfor i = 1:1000 for j = 1:size(WFull, 3) z(:,:,j) = WFull(:,:,j)*y; endendtoc
Do you think that slicing large 3D gpuArrays is just really slow compared to looking up cell array values?
Best Answer