Solved – e1071 svm predict – missing predictions

missing datarsvm

I use the following code

m <- svm(x_train, y_train)
current_class_prediction <- predict(m, x_cv)

but predict returns 999 predictions instead of 1000:

> length(current_class_prediction)
[1] 999
> dim(x_cv)
[1] 1000   70

What can explain this problem?

Best Answer

It seems you have missing values in your predictors. I can reproduce this behavior with the following example:

R> library(e1071)
R> data(iris)
R> model <- svm(Species ~ ., data = iris)
R> length(predict(model, iris))
[1] 150
R> tmp <- iris
R> tmp[1, "Sepal.Length"] <- NA
R> length(predict(model, tmp))
[1] 149

Related Solutions

Solved – Problem with e1071 libsvm

In the libsvm FAQ is mentioned that the labels used "inside" the algorithm can be different from yours. This will sometimes reverse the sign of the "coefs" of the model.

For instance, if you had labels $y=[-1,+1,+1,-1,...]$, then the first label in $y$, which is "-1", will be classified as $+1$ for running libsvm and, obviously, your "+1" will be classified as $-1$ inside the algorithm.

And recall that the coefs in the returned svm model are indeed $\alpha_n\,y_n$ and so your calculated $w$ vector will be affected due to reversion of the sign of the $y$'s.

See the question "Why the sign of predicted labels and decision values are sometimes reversed?" here.

Solved – Tuning SVM parameters in R

Try using the caret package.

library(caret)
set.seed(12345)

#Create simulation data
topxdata = matrix(rnorm(200, mean=0, sd=1), nrow = 20, ncol = 10)
botxdata = matrix(rnorm(200, mean=1, sd=1), nrow = 20, ncol = 10)
xdata = rbind(topxdata, botxdata)
colnames(xdata) = 1:10

ydata = c(rep("Top", 20), rep("Bottom", 20) )
ydata = as.factor(ydata)


# Setup for cross validation
ctrl <- trainControl(method="repeatedcv",   # 10fold cross validation
                     repeats=5,         # do 5 repetitions of cv
                     summaryFunction=twoClassSummary,   # Use AUC to pick the best model
                     classProbs=TRUE)


#Train and Tune the SVM
svm.tune <- train(x=xdata,
                  y= ydata,
                  method = "svmRadial",   # Radial kernel
                  tuneLength = 5,                   # 5 values of the cost function
                  preProc = c("center","scale"),  # Center and scale data
                  metric="ROC",
                  trControl=ctrl)

svm.tune

Post your result

Support Vector Machines with Radial Basis Function Kernel 

40 samples
10 predictors
 2 classes: 'Bottom', 'Top' 

Pre-processing: centered (10), scaled (10) 
Resampling: Cross-Validated (10 fold, repeated 5 times) 
Summary of sample sizes: 36, 36, 36, 36, 36, 36, ... 
Resampling results across tuning parameters:

  C     ROC    Sens  Spec
  0.25  0.980  0.85  0.91
  0.50  0.975  0.85  0.90
  1.00  0.955  0.83  0.88
  2.00  0.945  0.82  0.84
  4.00  0.945  0.81  0.77

Tuning parameter 'sigma' was held constant at a value of 0.06064355
ROC was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.06064355 and C = 0.25.
```

Best Answer

Related Solutions

Solved – Problem with e1071 libsvm

Solved – Tuning SVM parameters in R

Related Question