Using R, I have developed three models:
- linear regression using
lm()
; - decision tree using
rpart()
; - k-nearest neighbor using
kknn()
.
I would like to conduct leave-one-out cross-validation tests and compare these models. However, which error metric should I use for better representation? Does mean absolute percentage error (MAPE) or sMAPE (symmetric MAPE) look fine? Please suggest me a metric.
For example, when I conducted leave-one-out CV tests on linear regression (LR) and decision tree (DT) models, the sMAPE error values are 0.16 and 0.20. However, the R-squared values of LR and DT are 0.85 and 0.92 respectively. Where sMAPE computed as [sum (abs(predicted - actual)/((predicted + actual)/2))] / (number of data points)
. Here DT is pruned regression tree. These R^2 values are computed on full data set. There are a total of 60 data points in the set.
Model R^2 sMAPE
LR 0.85 0.16
DT 0.92 0.20
Best Answer
Lots of metric exist and no one is generally the best to use, it depends of your problem, of your data. Often, many metric can be used. I find usefull, to compute both hypothesis test and different metric (RMSE, MAPE ...), and see if they provide similar result. So your conclusions won't be based only on one metric.