The statements
are unnecessary overhead and are clouding the issue. I re-implemented your code as below, removing all 3 occurrences of these statements and also handling v1 more efficiently. On my machine, the results of a plain for-loop are not significantly different. I expected no difference.
When running with parfor, I find that loop 1 is indeed about 25% faster than loop 2. I assume it's because parfor pre-parses the contents of the loop, analyzing which variables are temporary and which have other roles, see Classification of Variables. Because of this, I imagine parfor can pre-allocate temporary variables or optimize their use in some other obscure way. v1 = L_rate/sqrt(cd);
Hlcd=v1*Hlcd;
for k = 1:5
tic
parfor (n=1:10000,2)
v2 = v1*[Vl0,1].';
v3 = rand(size(Hl0))<Hl0;
v4 = Vlcd'*Hlcd;
dwnet = v2*v3-v4;
end
time1(k,1) = toc;
tic
parfor (n=1:10000,2)
dwnet = (v1*[Vl0,1]')*(rand(size(Hl0))<Hl0) - Vlcd'*Hlcd;
end
time2(k,1) = toc;
end
Follow-up Question: Say that Vl0 and Hl0 rely on dwnet, and should be updated with every iteration, and Vlcd changes with each iteration. But I could leave Vl0 and Hl0 and dwnet at the same values for several iterations while changing Vlcd (say, 10-100 iterations per batch).
You would probably have to use a double for-loop
for i=1:N
parfor j=1:M
end
end
where the inner parfor loop loops over the batches over which Vl0 and Hl0 remain constant. It might not be worthwhile, though. Batches that small could probably be vectorized, circumventing a for-loop altogether. At least, your simplified example certain can be.
You also have to factor in communication to/from the labs. A normal parfor loop would have to broadcast its data to/from the client every time it begins and ends, so in your case at the beginning and end of every i-th iteration of the outer for-loop. It may help, however, to use Edric Ellis' Worker Object Wrapper, which can make data persist between parfor-loops.
Best Answer