MATLAB: Does the script running “parfeval” not work when submitted as a job on an HPC

hpcjobparallelParallel Computing Toolboxparfeval

I have a script "myScript.m" that uses "parfeval" to run some functions in parallel. These functions generate files to an output directory on a network drive. When I run this script on my local machine, it works and I can see the output files on the network drive. It also works when I run an interactive MATLAB session (with a GUI) on the HPC cluster. However, it does not work when I submit a job script to our HPC job scheduler that executes:
matlab -nodisplay -r myScript
Even more puzzling is that it works if I make my code run serially. What is going on?

Best Answer

The cause of this issue is that "myScript" does not have corresponding "wait"/"fetchOutputs" calls for each "parfeval". To explain, "parfeval" takes a function and sends it to a worker to run in the background while allowing the current script to continue executing without having to wait for that function to finish. This is a form of "asynchronous computing":
One should call "wait" on the "FevalFuture" object returned by "parfeval" in order to block the current script until the corresponding function running in the background finishes. Alternatively, you may use "fetchOutputs" which does the same thing except it also returns the output of the background function:
Without these calls, the script you submitted to your HPC cluster "myScript" will finish before the functions in the background can finish. When this happens, your HPC job scheduler will assume your script has completed and shut down the resources allocated to your job while your background functions are still running. Thus, you do not see the expected output files.