MATLAB: Do I get an error when trying to deploy applications to Cloudera Spark using the MATLAB API for Spark

clouderagentraversableonceMATLAB Compilerscalaspark

I am following trying the example described on this documentation page:
This workflow is further explained in this MATLAB answers post:
When I try executing the jobs with "yarn-client", I am seeing many warning and error messages in the yarn log. Specifically, the yarn-log contains the following message:
java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
I am currently using:
– Cloudera CDH 5.14.1
– Spark 2.2.0 v Cloudera2-1
– Scala version 2.11.8 (Java HotSpot™ 64-Bit Server VM, Java 1.8.0_161)
The exact commands that I used were the following:
1) For compilation:
>> mcc -C -W 'Spark:flightsByCarrierDemoApp' flightsByCarrierDemo.m
2) For execution:
>> ./run_flightsByCarrierDemoApp.sh /pkgs/cdh/matlab/runtime/v92 yarn-client hdfs://nameservice1/user/ed003898/airlinesmall.csv

Best Answer

First, make sure that you set the property ''spark.executorEnv.MCR_CACHE_ROOT'' to something like '/tmp/matlabapp'.
Next, when using Spark version 2, the compilation command should be:
>> mcc -C -W 'Spark:flightsByCarrierDemoApp,2' flightsByCarrierDemo.m
Also, it is recommend that the code explicitly sets the SparkVersion, as shown below
>> conf = SparkConf( ...
'AppName','flightsByCarrierDemo', ...
'Master',master, ...
'SparkProperties',sparkProperties, ...
'SparkVersion', '2' ...
);