MATLAB: Regression Learner App RSME different from validating dataset

crossmodelr^2r-squaredrsmeStatistics and Machine Learning Toolboxvalidation

I'm using the Regression Learner app with the option of SVM Linear algorithm to train a dataset and build a regression model. Then I record the reported R^2 and RSME values from the app. When I export the model and run the model on the same training dataset from the command line, I get a very different R^2 value and RSME. I only get this problem with my large data sets (where the number of features is quite large – ~ 10,000 to 100,000).

Best Answer

The RSME and R-squared values in the Regression Learner App are coming from the Cross-Validated data. This is not the same as plugging in the dataset into the model and calculating the RSME and R-squared values. The Regression Learner App is using K-fold Cross validation and is validating at each step so it is not the same as the final model.
For more information on Cross-Validation, please see the following link:
A good way to look deeper into how the App is calculating these values is to generate the code from the Regression Learner App and take a deeper look.
The reason why the RSME and R-squared values might be VERY far off, could be because there are too few data points relative to the very large number of features. This could be causing underfitting to occur.