Solved – Compare Models. LOCCV implementation in R

modelingr

I want to compare three models, one linear-regression-model, one regression-tree-model (from rpart) and one MARS-model (from mda package).

I want to compare the models using a leave one out cross validation using the mean square error and MAPE. I have the following implementation in R:

library(data.table)
library(rpart)
library(mda)

#Load Sample-Data
data(trees)

#The following models should be compared:
# lm(Volume~Girth+Height, data=trees)
# rpart(Volume~Girth+Height, data=trees)
# mars(trees[,-3], trees[3])

LOOCV<-function(modelCall) {
  unlist(sapply(seq(1,nrow(trees)), function(i) {         
    training=trees[-i,]
    test=trees[i,]

    fit=eval(modelCall)
    testValue = predict(fit, test[1:2])

    test[3]-testValue
  }))
}

LOOCV_MSE<-function(modelCall) {
   sum(LOOCV(modelCall)^2)/nrow(trees)
}

LOOCV_RMSE<-function(modelCall) {
   sqrt(LOOCV_MSE(modelCall))
}

LOOCV_MAPE<-function(modelCall) {
  sum(abs(LOOCV(modelCall)/sapply(seq(1, nrow(trees)), function(i) {trees[i,3]})))/nrow(trees)*100                                    
}


cat("Cross-Validation Metrics:\n")
cat("-------------------------\n")
cat("LOOCV MSE for LM:", LOOCV_MSE(quote(lm(Volume~Girth+Height, data=training))),"\n")
cat("LOOCV MSE for CART:", LOOCV_MSE(quote(rpart(Volume~Girth+Height, data=training))),"\n")
cat("LOOCV MSE for MARS:", LOOCV_MSE(quote(mars(training[,-3], training[3]))),"\n")
cat("\n")

cat("LOOCV RMSE for LM:", LOOCV_RMSE(quote(lm(Volume~Girth+Height, data=training))),"\n")
cat("LOOCV RMSE for CART:", LOOCV_RMSE(quote(rpart(Volume~Girth+Height, data=training))),"\n")
cat("LOOCV RMSE for MARS:", LOOCV_RMSE(quote(mars(training[,-3], training[3]))),"\n")
cat("\n")

cat("LOOCV MAPE for LM:", LOOCV_MAPE(quote(lm(Volume~Girth+Height, data=training))),"\n")
cat("LOOCV MAPE for CART:", LOOCV_MAPE(quote(rpart(Volume~Girth+Height, data=training))),"\n")
cat("LOOCV MAPE for MARS:", LOOCV_MAPE(quote(mars(training[,-3], training[3]))),"\n")

Outputs:

Cross-Validation Metrics:
-------------------------
LOOCV MSE for LM: 18.15783 
LOOCV MSE for CART: 69.83769 
LOOCV MSE for MARS: 13.72282 

LOOCV RMSE for LM: 4.2612 
LOOCV RMSE for CART: 8.356895 
LOOCV RMSE for MARS: 3.704432 

LOOCV MAPE for LM: 14.6114 
LOOCV MAPE for CART: 23.51401 
LOOCV MAPE for MARS: 10.00316 

Does this implementation make sense? When whould using MSE on the errors make sense? When would I use MAPE/SMAPE instead? I already read "Metric to compare models?" and the conclusion there was it depends, can someone explain this further. On what does it depend?

My data is not a time series, it is more like the tree example data.

Best Answer

Why re-invent the wheel? R already has many libraries that implement cross-validation and calculate RMSE, MSE, MAPE, etc:

library(caret)
library(forecast)
library(rpart)
library(mda)

#Load Sample-Data
data(trees)

#Custom Summary Function for Cross-Validation
customSummary <- function (data, lev = NULL, model = NULL) {
    stats1 <- postResample(data[, "pred"], data[, "obs"])
    stats2 <- accuracy(data[, "pred"], data[, "obs"])
    c(stats1, stats2)
}

#Choose sampling method and sumamry function
myControl <- trainControl(method='LOOCV', summary=customSummary)

#Run Models
model_LM <- train(Volume~Girth+Height, data=trees, method='lm', 
                  trControl=myControl)
model_CART <- train(Volume~Girth+Height, data=trees, method='rpart', 
                  trControl=myControl)
model_MARS <- train(Volume~Girth+Height, data=trees, method='earth', 
                  trControl=myControl)

#Compare models
model_LM
model_CART
model_MARS

Choosing which statistics to use is up to you, and depends on the particulars of your problem. I usually use MAE as a default, but it pays to think about what the actual error cost in your problem.

Related Question