Hello there,
below is the code i want to run. The rand()-calls are only for code simplicity. In my code the variables obviously have meaningful content.
for N = 1024 this takes about 2 hrs to run on my machine. I've tried so many things, e.g. precalculate the cosArgs.
N = 1024;img = rand(N);cosArg1 = rand(N^2,1);cosArg2 = rand(N^2,1);[q, p] = meshgrid(0:N-1, 0:N-1); %p and q are just another NxN size matrices respectively
recon = zeros(numel(img),1);for k = 1:numel(img) a = img.*cos(cosArg1(k)*p).*cos(cosArg2(k)*q); recon(k) = sum(a(:));% sum of vec is faster then sumsum of matrix although we need to save it as variable
end
Is there any clever way to speed this code up?
_______
I also just bought Parallel Processing Toolbox to make it work with GPU-Arrays. This nown takes abouzt 17 min with a GTX 1060. The variables ending with GPU are just gpuArray-Casts of their original.
EDIT: by first casting to single, i cut it down to 10 min.
Is there something I can do better?
cosArg1GPU = gpuArray(single(cosArg1)); cosArg2GPU = gpuArray(single(cosArg2)); imgGPU = gpuArray(single(img)); reconGPU = gpuArray(single(recon)); pGPU = gpuArray(single(p)); qGPU = gpuArray(single(q));for k = 1:numel(imgDCTGPU) % sum of vec is faster then sumsum of matrix although we need to save it as variable a = imgDCTGPU.*cos(cosArg1GPU(k)*pGPU).*cos(cosArg2GPU(k)*qGPU); reconGPU(k) = sum(a(:));end
Best Answer