Hi,
I am wondering how I can calculate the transmission limit of the parfor loop below
Time Calls Line7.41 4 316 parfor i=1:n 317 for j=1:n 318 if i<j 319 c_ij=S(a{i} & b{j}); 320 ex_ij=e(c_ij); 321 smat(i,j)=max(ex_ij); 322 abest_ij=abs(smat(i,j)-ex_ij)<5000*eps; 323 slcCell{i,j}=c_ij(abest_ij); 324 elseif i>j 325 c_ij=S(a{i} & b{j}); 326 ex_ij=e(c_ij); 327 smat(i,j)=max(ex_ij); 328 abest_ij=abs(smat(i,j)-ex_ij)<5000*eps; 329 slcCell{i,j}=c_ij(abest_ij); 330 else 331 end 332 end 333 end
for n=25 or n=26 to optimize this code further and to extend its computation limits. Since, for an older version of this loop I got a Java Exception error indicating that the transmission limit of 2 GB has been exceeded in the first case. However, even this loop produces a serialization error in the latter case, which is also caused by exceeding its transmission limit launching eight workers in total. Nevertheless, due to my naive computation approach I already expected for n=25 with this code a transmission limit error. Before I turn to my naive computation approach, here is a short variable description and explanation of the loop.
Description of the variables: N=2^n-1; S and e are double with length N. a{i} and b{j} are logical with length N. a{i} is the data array of S with bit i. b{j} is the data array of S without bit j. Of course, S, e, and b are indexed but not sliced, however, a is sliced. smat and slcCell are used elsewhere. Of no relevance here. The built-ins and overhead of the complete function consume about 10 percent of the elapsed computing time according to a profile analysis.
The purpose of this parfor loop is to make a pre-selection of S for all pairs (i,j).
Now, let us discuss what I expected for n=25 and n=26.
1.) n=25 1.1 The data arrays S and e have 256 MB each. 1.2 The data arrays a{i} and b{j} have 32 MB each
Launching eight workers, I get already for the case 1.1: 512*8 = 4096 MB > 2 GB
2.) n=26 2.1 The data arrays S and e have 512 MB each. 2.2 The data arrays a{i} and b{j} have 64 MB each
Launching eight workers, I get now for the case 2.1: 1024*8 = 8192 MB > 2 GB
In any case, the above code should fail, but it only fails for the second case.
My questions are: 1.) How is the correct estimation of case 1.1 and 2.1 in order to improve this code further.
2.) Variables S, e, and b are not sliced which triggers probably a communication overhead of 10 percent. How is it possible to make S, e, and b sliced? Make this sense? Since, I have to pass the complete data array of S and e to make a pre-selection for the pair (i,j).
3.) Are there any plans for a forthcoming Matlab Release to drop the transmission limit of 2 GB in the parfor loop? I totally agree with the philosophy that constraints are helpful to optimize code, but only to a certain limit.
Cheers, Holger
Best Answer