Hi,
I'm new to parallel processing and hoping I can get some suggestions from the larger Matlab community. I have a set of N column vectors of size (4×1) that can be written as X = (4xN) matrix. At each time step, I need to run update calculations on each of the Nx(4×1) vectors. The updated values then become part of the input at the next time step. I have already vectorized everything so there are no for loops in calculating the update.
I'm trying to speed up the process more by using parfor. I have seen improvement from 1 to 2 cores but 3 and 4 cores are both comparable to 2 cores. I'd like to see if I can continue improving with additional cores (particularly if I were to run this on a larger cluster). I have read enough elsewhere to understand this may not be possible, but I'd like to try.
I'm currently sending the workers everything at each time step, but it seems like I should be able to keep the updated information on the workers so they can use it at the next time step. To hopefully make this a little clearer, I am doing the following in pseudo-code:
X = InitialCondition();for k=2:numtimesteps % turns X into a cell array, where each cell can go to a worker
XC = SliceFunction(X,numworkers); XCnew = cell(1,numworkers); parfor i=1:numworkers XCnew{i} = UpdateFunction(@CalculateAB,XC{i},otherinputs); % otherinputs is much smaller than X
end % final X, which becomes the input at the next timestep
X = [XCnew{:}];endfunction XCnew = UpdateFunction(CalculateAB,XC,otherinputs)% this function calculates A,B, then solves x=A\B
% note A,B are each 3D arrays and I need to solve A*x=B for each 2D slice
[A,B] = CalculateAB(XC,otherinputs);% this is a modification of the File Exchage multinv
% it turns the 3D A,B matrices into sparse 2D matrices and solves using the \ operator
XCnew = multimldivide(A,B); end
So to summarize, I guess the question is this: can I keep information on the workers so that I don't have to send as much info back and forth? I'm hoping this could reduce the overhead involved with using parfor, so that I can continue to see speed improvements as I increase the number of cores. Or are there other tricks to reduce the overhead? I'm constantly going in and out of the parfor with each iteration of k.
Thanks!
Best Answer