I have developed a model which evaluates a user based on how important he is for the organization.
For that purpose I have generated 1000 records for 1000 users. Here I have one dependent variable "Value" and there are other independent features which contributes to the "Value" of the user. The "Value" can have any value between 1-1000.
I have rationed training data as 90:10 and when i ran SVM algo I see that the testing data predictions are well matched.
Now I am looking for some function in R language which will compare predicted "Value" and actual "Value" of testing data and tell me how accurate the prediction of "Value" was.
I have come across confusionMatrix but seems it works it will work when dependent data can have only 2 class like 0/1 or true/false. In my case the "Value" can have any integer between 0-1000.
Please suggests what can be the best approach to evaluate the accuracy and sensitivity of the model.
Adding answer to user20160 as I dont have enough point to add comments.
I am using below logic to run svn on my training and testing data.
## separate feature and class variables
test.feature.vars <- test.data[,-1]
test.class.var <- test.data[,1]
> formula.init <- "user.rating ~ ."
> formula.init <- as.formula(formula.init)
> svm.model <- svm(formula=formula.init, data=train.data,
+ kernel="radial", cost=100, gamma=1)
> summary(svm.model)
svm.predictions <- predict(svm.model, test.feature.vars)
And now I need to compare
data=svm.predictions and reference=test.class.var
Update 2: Based on what geekoverdose has answered.
Thanks I have tried fitting the model suggested by you and evaluate RMSE metric.
userValue,User_Salary_Rating,USer_Exp_years,Low_Critical_App,isThirdPartyUser,isSuperUser,isSysAdm
100,18,6,2,0,0,12
10,0,0,0,0,0,0
30,0,3,0,0,0,7
26,0,3,0,0,0,3
52,0,3,0,1,0,10
71,9,0,0,0,1,10
46,0,6,0,0,0,10
29,0,0,0,0,0,15
62,9,3,0,0,0,15
57,0,3,0,1,0,15
And when I run the train command I am getting below error. Please suggest what might be going wrong here.
> model <- train(x = test.data[,2:6], y= test.data$userWeight, method = 'svmLinear', tuneGrid = expand.grid(C=3**(-5:5)), trControl = trainControl(method = 'repeatedcv', number = 10, repeats = 10, savePredictions = T))
Something is wrong; all the RMSE metric values are missing:
RMSE Rsquared
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :11 NA's :11
Error in train.default(x = test.data[, 2:6], y = test.data$userWeight, :
Stopping
In addition: There were 50 or more warnings (use warnings() to see the first 50)
PS: I have already requested merge of accounts so that I can add comments.
Best Answer
As @user20160 and @shrey pointed out you should address this as regression problem and use cross validation to obtain a model that also works on unseen data. The core reason is that your score is a conceptually continuous value and not just a regular class (though your score is limited to integer values, but you can always do a simple
round
after your prediction).Here's a minimal example on how to train an svm model with
caret
(currently usingsvmLinear
as model type, but you could change that tosvmRadial
etc. if you want) using repeated cross validation:You can now visualize the relation between predicted and observed (=real) values using a simple scatterplot. This essentially is the counterpart of what you aimed for with using a confusion matrix. In the example I use results stored during repeated cross validation, but you could also use a hold-out test set the same way:
This plot gives you information about how errors happen in your prediction. Together with the usual error measures (e.g. RMSE, which is computed automatically with caret) you could decide if the model already is what you wanted to have/decide on the best suited model from multiple, different models: