Hello, I have taken someone else's code which had several for-loops in it, and vectorized the code. The code now runs extremelly fast and even faster when using gpuArray(). I'm at a point now where the slowest part of the code is where I used the accumarray function. I'm not sure if there are alternative approaches to vectorizing code without using accum array.
Here is a rough example of the code I vectorized and an exampe of that code, vectorized
nx = 16;nz = 512;A = rand(nx,nx,nz);%% First Method For Loops
R1 = zeros(nx,nx,nz);B1 = zeros(nx,nz);N1 = zeros(nx,nz);for k = 1:nz for j = 1:nx for i = 1:nx r = round((i^2+j^2)^0.5); if(r <= nx) R1(i,j,k) = r; N1(R1(i,j,k),k) = N1(R1(i,j,k),k)+1; B1(R1(i,j,k),k) = B1(R1(i,j,k),k) + A(i,j,k); end end end N1(1,k) = 1;endB1 = B1./N1;%% Second Method Vectorized
[I,J] = ndgrid(1:nx,1:nx);r = (I.^2+J.^2).^0.5;R2 = round(r);L21 = (R2 <= nx);N2 = squeeze(repmat(histcounts(L21.*R2(:,:,1),1:nx+1),[1 1 nz]));A2 = reshape(A,[1,numel(A)]);B2 = reshape(repmat(L21.*R2,[1 1 nz]),[1,numel(A)]);L22 = logical(B2).*repelem(0:nx:(nz-1)*nx,(nx)^2);B2 = B2+L22;B2 = reshape(accumarray((B2+logical(~B2)).',A2.*logical(B2)).',[nx nz])./N2;% Check that Vectorized Code = For Loop Code
sum(sum(sum(B1-B2)))
The vectorized code isn't much faster when using gpuArray() and is only a tiny bit faster than the non-vectorized code without using gpuArray(). It is currently the bottleneck within my code and I'm not sure what other functions might be able to help me out here.
The great thing about the vectorized code is that I can put it into a function and pull several of the variables out of the function in an initialize step. Still though, the accumarray takes a significant amount of time to run compared to all other parts of my code (not shown) that is vectorized. And when accumarray is put into a function, is appears to be slower when I put tic/toc just outside of the function.
Thanks!
Best Answer