Solved – SVM prediction sensitivity when compared to neural networks and logistic regression

classificationlogisticmachine learningneural networkssvm

Basically I want to classify a rather rare status (about 2% of the 2000) with some predictors. I have used logistic regression, neural network, and Support Vector Machines to do it.

All the predictors in the logistic regression are statistically significant. And to avoid overfitting, I have self-implemented a 10-fold CV for all of the methods. For each iteration, I used the training dataset to fit the model and find out the fitted values. Than I use the ROCR package in R to find out the decision criterion to achieve 70% sensitivity for the training dataset. Then I use the model and the criterion to predict the status of the testing dataset and compute the testing sensitivity and positive predictive value. After 10 iterations I got 10 testing sensitivities and PPVs.

My finding is: the logistic regression did the best. The testing sensitivities were roughly 70% and PPVs were around 16%. But very surprisingly, the performance of SVM is very poor: mean testing sensitivity = 43%, PPV = 11%.

I am not very familiar with the theory behind SVM so I tried both kernlab and e1071 in R. I have also experimented with C-svc, nu-svc, C-bsvc, as well as tuning the svm using tune.svm in e1071 but the performance was similar.

So my question is: was I doing something wrong, or was I missing something when fitting a svm?

Best Answer

The SVM is designed to determine the optimal decision boundary for only one ratio of false-positive and false-negative misclassification costs, so it is not really a fair comparison to change the threshold to adjust the sensitivity. A better approach would be to tune the regularisation parameters (C or nu) for each class independently (some packages support this, some don't) by optimising a cross-validation estimate of your statistic of interest. Note that to get an unbiased performance estimate, you will need to perform a nested cross-validation.

Logistic regression doesn't suffer from this problem as the loss function is intended to minimise the error in estimating the posterior probability of class membership everywhere, rather than merely for p=0.5 (or some other value depending on the ratio of misclassification costs). I no longer use the SVM very much because for most applications, I do actually want the probabilities that logistic regression provides, however I use regularised logistic regression (which gives similar over-fitting avoidance you get with the SVM) and kernel logistic regression if I want a non-linear model (or Gaussian Process Classifiers for a Bayesian equivalent - although the difference in performance between GPC and KLR is generally quite small).