MATLAB: Synchronization for parallel code

parallel computingspmdsynchronization

Hi,
I have a simple task to accomplish using the parallel tool box and distributed computing server. I need to execute a program (unix() call) on multiple workers. The workers run the same program but the program takes in a data file and writes a data file. Each worker's running program takes a different data file in a directory which all workers have access to. There are more data files than workers, so I would like a worker to choose another data file to run after executing the program on a previous data file. I want all workers to do this until all data files are exhausted. After reading the doc, it seems spmd is the way to go.
I have a cell array of file names and I would like to create an array of booleans of the same dimension to use for synchronization. When a worker accesses a file to work on, it should set the boolean at the array index of the index of the file that it's working on. When each worker finishes it's program run, it indexes through the booleans to find a file that has not been run and sets the boolean for that file index and then runs that one.
Is there any way for the workers to see a common array and also modify this same array?
Right now I'm doing it in a parfor loop that is dimensioned for maximum number of workers. This works fine, but I have a lot more files than workers. Therefore I have to wait for the parfor loop to finish before manually repeating the process for the remaining files. The program takes variable hours to complete on each worker so the parfor loop is bottlenecked by the longest run.
Is there a way to accomplish this using tasks as well? It seems like a simple thing but I have not found any examples of how to do this.
Thank you for reading.

Best Answer

It sounds like you have independent tasks with more work to do than workers to do it. You also would like the tasks to be properly load-balanced. With these goals, parfor is the right tool. Note that parfor is a different construct than spmd in the Parallel Computing Toolbox.
Since you already have a cell array of file names and all the workers have access to the directory they are in, parfor can easily help you to load data from the files and use the data.
For example, if a different A variable is saved in each file, this code will allow 2 workers to process 5 files in an independent fashion without stomping on each other (no boolean array required).
matlabpool open local 2
inFiles = {'A.mat','B.mat', 'C.mat','D.mat','E.mat'};
outData = cell(1,length(inFiles));
parfor i=1:length(inFiles)
S = load(inFiles{i}, 'A');
outData{i} = S.A+3;
end
At the end of the computation, each worker has stored the result of its computation in a cell array with the same number of elements as the number of files processed.
Unfortunately, you cannot call save from within a parfor loop. You didn't mention how much data you need to save at each iteration. It may be that you can store the results as shown above until the parfor loop completes and then have a post-processing save step that executes outside of a parfor loop.