MATLAB: Parallelization and mex-compiled code

parfor parallelization mex fortran90

I want to optimize a mex-compiled function (fortran-90 source) defined over an 1D interval by computing its values on a sufficiently fine sampling. It works fine with a for-loop but when I try parfor (for speed) I get crashes in the mex-compiled code (getting a error from one of the workers). Is this a documented problem, and does anyone have suggestions how to localize what goes wrong?
I run MatlabR2013a and Ubuntu 13.10 on a 16 core (32 virtual) machine and I get 12 workers when I do matlabpool.

Best Answer

You should try running a plain for-loop first, but with the iterations in random order, i.e., instead of
for i=1:n
...
end
run as
for i=randperm(n)
...
end
This is a good way to test whether your code is independent of the order of the iterations (a basic requirement of parfor) before the Parallel Computing Toolbox even gets involved.