MATLAB: How to set LD_PRELOAD for a Hadoop cluster

glibchadoopmapreduceMATLAB Compilerrhelshimspark

I am testing two scripts, one that uses Spark and one that uses MapReduce, on a Hadoop cluster running Red Hat Enterprise Linux 7.6. I am receiving the following error:
Error: failed /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.232.b09-0.el7_7.x86_64/jre/lib/amd64/server/libjvm.so, ...
because /apps/matlab/R2020b/sys/os/glnxa64/libstdc++.so.6: undefined symbol: __cxa_thread_atexit_impl
This seems to be because the nodes have GLIBC 2.17 installed and this symbol starts being defined in GLIBC 2.18. I understand that there is a shim library for GLIBC 2.17 to work with MATLAB R2020b, but how can I force the cluster to find and load it?
 

Best Answer

We can preload the library in the MATLAB scripts through cluster properties without modifying the system environment variables. The two scripts require slightly different code, because they are built on different technologies. In both of these, $MATLAB_ROOT needs to be replaced with the full path to the MATLAB installation on the worker.
For "mapreduce", modify your code as below:
cluster = parallel.cluster.Hadoop(..)
...
cluster.HadoopProperties('mapred.child.env') = 'LD_PRELOAD=$MATLAB_ROOT/bin/glnxa64/glibc-2.17_shim.so'
...
mapreducer(cluster)
For the Spark-based script, modify the code as below:
cluster = parallel.cluster.Hadoop(..)
...
cluster.SparkProperties('spark.executorEnv.LD_PRELOAD') = '$MATLAB_ROOT/bin/glnxa64/glibc-2.17_shim.so'
...
mapreducer(cluster)
Related Question