I wanted to run a grid search to find suitable parameters for my SVM model but I have discovered that fitrsvm gives inconsistent errors if the value of the epsilon parameter is generated using a ‘for loop’. For example the RMSE for my model with epsilon = 0.8 will be different if I use the for loop:
for epsilon = 0.8:.1:1.2
compared with if I use the for loop
for epsilon = 0.1:.1:1.2
The RMSEs are 2.6868 and 2.7020 respectively
I thought this might be some floating point error, so I tried to ensure that the epsilon value passed to fitrsvm was exactly 0.8. I did this by creating variable d_epsilon (line 17) and passing its value to fitrsvm (ie by changing line 26 to ‘Epsilon’ = d_epsilon but this did not work. By contrast using c_epsilon which is completely independent of the for loop (line 16) does work.
In my real project, I use nested loops to search for values for Epsilon, Boxconstraint, and KernelScale. The inconsistencies in my results are about 10%. (I am using a grid search as the parameters returned using OptimizeHyperparameters perform worse that some of the parameters cited in journal articles for my dataset (UCI’s auto-mpg).
clear all%%read in auto-mpg.csv. This is a cleaned version of UCI dataset auto-mpg
data = readtable('auto-mpg.csv','ReadVariableNames',false);VarNames = {'mpg','cylinders' 'displacement' 'horsepower' 'weight' 'acceleration' ... 'modelYear' 'origin' 'carName'};data.Properties.VariableNames = VarNames;data = [data(:,2:9) data(:,1)];data.carName=[];%%carry out 10 fold cross-validation with different epsilon values
testResults_SVM=[];testActual_SVM=[];rng('default')c = cvpartition(data.mpg,'KFold',10);for epsilon = 0.1:0.1:1.2 %c_epsilon= 0.80000;
%d_epsilon = str2double(string(round(epsilon,2)))
for fold = 1:10 cv_trainingData = data(c.training(fold), :); cv_testData = data(c.test(fold), :); AutoSVM = fitrsvm(cv_trainingData,'mpg',... 'KernelFunction', 'gaussian', ... 'PolynomialOrder', [], ... 'KernelScale', 5.5, ... 'BoxConstraint', 100, ... 'Epsilon', epsilon, ... 'Standardize', true); convergenceChk(fold)=AutoSVM.ConvergenceInfo.Converged; testResults_SVM=[testResults_SVM;predict(AutoSVM,cv_testData)]; testActual_SVM=[testActual_SVM;cv_testData.mpg]; end %%generate summary statistics and plots
residual_SVM = testResults_SVM-testActual_SVM; AutoMSE_SVM=((sum((residual_SVM).^2))/size(testResults_SVM,1)); AutoRMSE_SVM = sqrt(AutoMSE_SVM); if round(epsilon,4) == 0.8 AutoRMSE_SVM end end
A copy of my dataset and code is attached or can be accessed via: https://drive.google.com/open?id=1ph1KwdGgFbmNVSwI63LREcXEDN3hkP_Q
Does anyone know a workaround to this? I am using Matlab R2017b
Best Answer