Solved – Measuring Accuracy of the SVM based model

machine learningrsvm

I have developed a model which evaluates a user based on how important he is for the organization.
For that purpose I have generated 1000 records for 1000 users. Here I have one dependent variable "Value" and there are other independent features which contributes to the "Value" of the user. The "Value" can have any value between 1-1000.

I have rationed training data as 90:10 and when i ran SVM algo I see that the testing data predictions are well matched.

Now I am looking for some function in R language which will compare predicted "Value" and actual "Value" of testing data and tell me how accurate the prediction of "Value" was.

I have come across confusionMatrix but seems it works it will work when dependent data can have only 2 class like 0/1 or true/false. In my case the "Value" can have any integer between 0-1000.

Please suggests what can be the best approach to evaluate the accuracy and sensitivity of the model.

Adding answer to user20160 as I dont have enough point to add comments.

I am using below logic to run svn on my training and testing data.

## separate feature and class variables
test.feature.vars <- test.data[,-1]
test.class.var <- test.data[,1]

> formula.init <- "user.rating ~ ."
> formula.init <- as.formula(formula.init)
> svm.model <- svm(formula=formula.init, data=train.data, 
+                  kernel="radial", cost=100, gamma=1)
> summary(svm.model)


svm.predictions <- predict(svm.model, test.feature.vars)

And now I need to compare
data=svm.predictions and reference=test.class.var

Update 2: Based on what geekoverdose has answered.

Thanks I have tried fitting the model suggested by you and evaluate RMSE metric.

userValue,User_Salary_Rating,USer_Exp_years,Low_Critical_App,isThirdPartyUser,isSuperUser,isSysAdm
100,18,6,2,0,0,12
10,0,0,0,0,0,0
30,0,3,0,0,0,7
26,0,3,0,0,0,3
52,0,3,0,1,0,10
71,9,0,0,0,1,10
46,0,6,0,0,0,10
29,0,0,0,0,0,15
62,9,3,0,0,0,15
57,0,3,0,1,0,15

And when I run the train command I am getting below error. Please suggest what might be going wrong here.

> model <- train(x = test.data[,2:6], y= test.data$userWeight, method = 'svmLinear', tuneGrid = expand.grid(C=3**(-5:5)), trControl = trainControl(method = 'repeatedcv', number = 10, repeats = 10, savePredictions = T))
Something is wrong; all the RMSE metric values are missing:
      RMSE        Rsquared  
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :11    NA's   :11   
Error in train.default(x = test.data[, 2:6], y = test.data$userWeight,  : 
  Stopping
In addition: There were 50 or more warnings (use warnings() to see the first 50)

PS: I have already requested merge of accounts so that I can add comments.

Best Answer

As @user20160 and @shrey pointed out you should address this as regression problem and use cross validation to obtain a model that also works on unseen data. The core reason is that your score is a conceptually continuous value and not just a regular class (though your score is limited to integer values, but you can always do a simple round after your prediction).

Here's a minimal example on how to train an svm model with caret (currently using svmLinear as model type, but you could change that to svmRadial etc. if you want) using repeated cross validation:

# dummy demo data
d <- iris[,1:4]
names(d) <- c(paste0('feat', 1:3), 'score')
d$score <- round((d$score-min(d$score))/max(d$score)*1000)

# train model
library(caret)
model <- train(x = d[,1:3], d$score, method = 'svmLinear', tuneGrid = expand.grid(C=3**(-5:5)), trControl = trainControl(method = 'repeatedcv', number = 10, repeats = 10, savePredictions = T))

You can now visualize the relation between predicted and observed (=real) values using a simple scatterplot. This essentially is the counterpart of what you aimed for with using a confusion matrix. In the example I use results stored during repeated cross validation, but you could also use a hold-out test set the same way:

plot(model$pred$pred~model$pred$obs, ylab = 'predicted', xlab = 'observed')
abline(0,1, col=2)

This plot gives you information about how errors happen in your prediction. Together with the usual error measures (e.g. RMSE, which is computed automatically with caret) you could decide if the model already is what you wanted to have/decide on the best suited model from multiple, different models:

> print(model)

Support Vector Machines with Linear Kernel 

150 samples
3 predictors

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 10 times) 

Summary of sample sizes: 135, 136, 136, 135, 134, 134, ... 

Resampling results across tuning parameters:

C        RMSE  Rsquared  RMSE SD  Rsquared SD
0.00412  142   0.882     21.7     0.0386     
0.0123   110   0.9       19.4     0.0327     
0.037    91.4  0.925     15.2     0.0257     
0.111    80.6  0.938     13.4     0.0225     
0.333    77.6  0.941     13.6     0.0223     
1        77    0.942     13.8     0.0225     
3        76.6  0.942     14       0.0225     
9        76.6  0.942     14.1     0.0225     
27       76.5  0.942     14.1     0.0225     
81       76.5  0.942     14.1     0.0225     
243      76.5  0.942     14.1     0.0225     

RMSE was used to select the optimal model using  the smallest value.
The final value used for the model was C = 27.

Best Answer

Related Solutions

Solved – Logistic Regression\SVM implementation in Mahout

Solved – Generalization bounds on SVM

Related Question