MATLAB: Vectorize a parfor loop to save time

     parfor ii=1:(subnumX-1)*(subnumY-1)
         [h1,p1] = subsolverCB(S1Group{ii}, Hhat1Group{ii}, tau, alpha, kappa, gamma, nIn,H1{ii},P1{ii},1); %Omega1%
         H1{ii}=h1;
         p1x=size(p1,1);
         p1y=size(p1,2);
         p1y=p1y/2;
         P1{ii}(1:p1x, 1:p1y)                       =p1(:,1:p1y);
         P1{ii}(1:p1x, subsizeY+1:(subsizeY+p1y))   =p1(:,(p1y+1):end);
     end

As is shown above, I used a parfor loop within a while loop, but elapsed time increased several times compared to a for loop. Maybe it's better to vectorize this part in order to save time,but I don't know how to realize it. Thanks in advance!

Here is the body of my own function subsolverCB，all 'if' is true since I deleted false cases:

if true
function [hSub,pSub] = subsolverCB(s1, h1hat, tau, alpha, kappa, gamma, 
nIn,HGroup,pInit,flag)    
[m,n] =size(h1hat);
hSub = HGroup;
hcheck = HGroup;
c=size(pInit,2)/2;
Es =zeros(size(h1hat));
errSub=zeros(nIn,1);
if flag==1
  px=pInit(1:m-1,1:n-1);
  py=pInit(1:m-1,c+1:c+n-1);
  Es(1:end-1,1:end-1)=s1;
end
 iterationSub = 1;
 while iterationSub <= nIn
    hold = hSub;
    [hx, hy] = grad2(hcheck);
    if flag==1
        Rhx=hx(1:end-1,1:end-1);
        Rhy=hy(1:end-1,1:end-1);
    end   
    ptildex = px + kappa*Rhx;
    ptildey = py + kappa*Rhy;
    Denom = sqrt(ptildex.^2+ptildey.^2);
    px = ptildex ./ max(Denom, 1);   py = ptildey ./ max(Denom, 1);
    if flag==1
        PPtmp=[px,zeros(m-1,1),py,zeros(m-1,1)];
        PP=[PPtmp;zeros(1,size(PPtmp,2))];
    end
    Edivp = div2(PP);
    htilde = hSub + gamma*(Edivp);
    tmp= (tau*htilde + gamma*h1hat-tau*gamma*alpha*Es)/(tau+gamma);
    hSub=max(min(tmp,1),0);
    hcheck = 2*hSub - hold;
    errSub(iterationSub)=norm(hSub-hold,'fro');
    iterationSub = iterationSub+1;
end
 pSub=[px py];
end

%function [hSub,pSub] = subsolverCB(s1, h1hat, tau, alpha, kappa, gamma, nIn, HGroup, pInit, flag) function [hSub,pSub] = subsolverCB(s1, h1hat, tau, alpha, kappa, gamma, nIn, hSub, pInit, flag) [m,n] = size(h1hat); %hSub = HGroup; %Why do you need 2 copies of HGroup? How about just replacing input HGroup with hSub? hcheck = hSub; c = size(pInit,2)/2; Es = zeros(size(h1hat)); errSub = zeros(nIn,1); if flag %== 1 , no need to check flag==1 if flag is either true or false px = pInit(1:m-1,1:n-1); py = pInit(1:m-1,c+1:c+n-1); Es(1:end-1,1:end-1) = s1; end % iterationSub = 1; % while iterationSub <= nIn tmpC1 = gamma*h1hat - tau*gamma*alpha*Es; %To prevent repeated calculations in for loop for iterationSub = 1:nIn hld = hSub; %hold = hSub; %don't override matlab function "hold", used for holding plots. [hx, hy] = grad2(hcheck); if flag %== 1 Rhx = hx(1:end-1,1:end-1); Rhy = hy(1:end-1,1:end-1); end ptildex = px + kappa*Rhx; ptildey = py + kappa*Rhy; Denom2 = ptildex.^2+ptildey.^2; %Denom = sqrt(ptildex.^2+ptildey.^2); %don't sqrt early to save computing power MaxDenom = sqrt(max(Denom2, 1)); %sqrt for a smaller set saves time px = ptildex ./ MaxDenom; %max(Denom, 1); py = ptildey ./ MaxDenom; %max(Denom, 1); if flag %== 1 %PPtmp = [px,zeros(m-1,1),py,zeros(m-1,1)]; %PP = [PPtmp;zeros(1,size(PPtmp,2))]; %Why not just initialize in one shot? PP = vercat([px,zeros(m-1,1),py,zeros(m-1,1)], zeros(1, size(PPtmp, 2))); end %Edivp = div2(PP); %htilde = hSub + gamma*(Edivp); htilde = hSub + gamma*(div2(PP)); %condensing into 1 line %tmp = (tau*htilde + gamma*h1hat - tau*gamma*alpha*Es)/(tau+gamma); %Prevent repeated calculation( "gamma*h1hat, tau*gamma*alpha*Es, etc. tmp = (tau*htilde + tmpC1)/(tau+gamma); hSub = max(min(tmp,1),0); hcheck = 2*hSub - hld; errSub(iterationSub) = norm(hSub-hld,'fro'); end pSub = [px py];

Best Answer

I've looked through the code and it is mostly vectorized already. See comments though for minor improvements. At this point, you'll have to test out different matlab operations to see if you can find a shortcuts, prevent unnecessary matrix copies, etc. You are essentially doing micro-optimization, which is often time-consuming but could be worth it, depending on speed requirements.

As for parfor, see comment above. It seems difficult to vectorize the parallel computing of subsolverCB.

See if you can avoid creating temporary variable px and py, which is just a subset of pInit.

Best Answer

Related Solutions

MATLAB: Solving symbolic trig equation in terms of theta, using “sol.”

MATLAB: Use ginput to crop an image

Related Question