MATLAB: How to update variable within a matfile inside a parfor loop

matfileparallel computingparfor

I have a very expensive loop that I'm trying to parallelize, and part of this loop involves updating an entry in a 4D array inside a matfile (I must save results to disk and access them through a matfile pointer due to RAM limitations). However, I get an error that says that the matfile pointer variable cannot be classified. As an illustrative example of what I'm trying to do, consider the code below:
testOut = []; % create variable to try and update
save('TestFile.mat', 'testOut', '-v7.3'); % save variable into accessible matfile
FileOut = matfile('TestFile.mat','Writable',true); % Set up pointer to matfile
nrows = 200; ncols = 200; nplanes = 32; nvolumes = 10; % Set up dimensions of 4D array
FileOut.testOut = single(zeros(nrows,ncols,nplanes,nvolumes)); % Set initial size of variable in matfile
parfor i = 1:nvolumes
FileOut.testOut(:,:,:,i) = i; % artificial example, point is that I want to update variable using fourth dimension index
end
The error this code would report is:
Error: The variable FileOut in a parfor cannot be classified.
See Parallel for Loops in MATLAB, "Overview".
Basically, the loop is performing an independent calculation on a 3D volume image each time, and each resulting volume image is saved in a 4D array using the fourth dimension to mark volume image number. However, I often have thousands of these images, and so the 4D array must be saved to disk and accessed from a matfile to avoid overloading the RAM. I'd like to adapt my code to use parfor, but I can't figure out how to get parfor to play nicely with the matfile pointer. Can anyone help me out here, please? I understand that sliced variables must be used with parfor, and that variables of the form I'm using here are not allowed, but I can't figure out a solution…

Best Answer

It might be possible to overcome the "slicing" problems you're seeing here - but you're still left with the fundamental underlying problem that you're trying to get multiple worker processes to write to the same file concurrently. That is never going to work well, as the writes will conflict and almost inevitably corrupt your file.
What I'd suggest is having each worker write to a temporary .mat file during the parfor loop, and then run a post-processing stage to collect the results. (I'm presuming here that computing stuff to go into the file takes a long time, but accessing the data in the file is relatively inexpensive). I'm going to use the parallel.pool.Constant from R2015b, but the same can be achieved using the Worker Object Wrapper.
This example is a bit involved, hopefully it's clear what's going on. You'll need to adapt things a little to get them to work with your multi-dimensional data.
%%step 1: create a mat-file per worker using SPMD
spmd
myFname = tempname(); % each worker gets a unique filename
myMatfile = matfile(myFname, 'Writable', true);
end
%%step 2: create a parallel.pool.Constant from the 'Composite'
% This allows the worker-local variable to used inside PARFOR
myMatfileConstant = parallel.pool.Constant(myMatfile);
%%Step 3: run PARFOR
parfor idx = 1:100
resultToSave = idx * 100;
matfileObj = myMatfileConstant.Value;
% Append into 'testOut', storing the index
matfileObj.testOut(1, idx) = resultToSave;
matfileObj.gotResult(1, idx) = true;
end
%%Step 4: accumulate the results on the client
% Here we retrieve the filenames from 'myFname' Composite,
% and use them to accumulate the overall result
outmatfile = matfile('out.mat', 'Writable', true);
for idx = 1:numel(myFname)
workerFname = myFname{idx};
workerMatfile = matfile(workerFname);
workerOutSz = size(workerMatfile, 'testOut');
for jdx = 1:workerOutSz(2)
if workerMatfile.gotResult(1, jdx)
outmatfile.out(1, jdx) = workerMatfile.testOut(1, jdx);
end
end
end