For tree-based models, I've used varImp in caret to extract feature importances; however, this doesn't work with KNN. Can someone explain why this is/if this is possible? Thanks!
Solved – Is it possible to extract feature importances from a KNN model
caretk nearest neighbourmachine learningr
Related Solutions
Alright, looking into the code of varImp.train I see, that in case of a classification problem, the variable importance is just computed via the filterVarImp function. So the variables are just ranked by the AUC, as stated in the documentation varImp under model independent metrics.
I tested it, by calling varImp on each of my models and comparing the variable importance values with the ones computed via filterVarImp on the training data.
## Compute variable importance via filter approach
varImps.filtered <- filterVarImp(trainData, trainClasses)
varImps <- list(knn=varImp(models$knn, scale=F),
pda=varImp(models$pda, scale=F))
## Sort variable importance by their average value
## over all classes in decreasing order.
varImps.filtered$Mean <- apply(varImps.filtered, 1, mean)
varImps.filtered <- varImps.filtered[with(varImps.filtered, order(-Mean)), ]
varImps.filtered$Mean <- NULL
... and surprise surprise, it is exactly the same.
> varImps$knn
ROC curve variable importance
variables are sorted by maximum importance across the classes
Class_1 Class_2 Class_3 Class_4 Class_5
V5 0.7094 0.9912 0.9431 0.9231 0.9912
V3 0.3706 0.5631 0.9744 0.9744 0.7831
V9 0.9725 0.9619 0.9725 0.8125 0.8988
V8 0.6887 0.6644 0.8650 0.9531 0.9531
V4 0.9325 0.9194 0.9325 0.6044 0.3138
V10 0.7250 0.8119 0.8544 0.8544 0.8331
V7 0.8169 0.7606 0.8244 0.7025 0.8244
V6 0.3650 0.5775 0.7838 0.8081 0.8081
V11 0.6194 0.7662 0.7662 0.6000 0.6506
V2 0.5138 0.7412 0.7412 0.5938 0.4031
U5 0.5609 0.5731 0.5731 0.4944 0.4834
U4 0.5259 0.5531 0.5531 0.5103 0.5109
U3 0.5134 0.5134 0.5103 0.5384 0.5384
U2 0.5384 0.5203 0.5216 0.5384 0.5219
U1 0.4853 0.5312 0.5312 0.5238 0.4872
> varImps.filtered
Class_1 Class_2 Class_3 Class_4 Class_5
V9 0.9725000 0.9618750 0.9725000 0.8125000 0.8987500
V5 0.7093750 0.9912500 0.9431250 0.9231250 0.9912500
V8 0.6887500 0.6643750 0.8650000 0.9531250 0.9531250
V10 0.7250000 0.8118750 0.8543750 0.8543750 0.8331250
V7 0.8168750 0.7606250 0.8243750 0.7025000 0.8243750
V4 0.9325000 0.9193750 0.9325000 0.6043750 0.3137500
V3 0.3706250 0.5631250 0.9743750 0.9743750 0.7831250
V11 0.6193750 0.7662500 0.7662500 0.6000000 0.6506250
V6 0.3650000 0.5775000 0.7837500 0.8081250 0.8081250
V2 0.5137500 0.7412500 0.7412500 0.5937500 0.4031250
U5 0.5609375 0.5731250 0.5731250 0.4943750 0.4834375
U4 0.5259375 0.5531250 0.5531250 0.5103125 0.5109375
U2 0.5384375 0.5203125 0.5215625 0.5384375 0.5218750
U3 0.5134375 0.5134375 0.5103125 0.5384375 0.5384375
U1 0.4853125 0.5312500 0.5312500 0.5237500 0.4871875
My goal was to come up with a model specific stable feature selection method. The only way I see to achieve this now, is by utilizing caret's inbuilt feature selection methods like "Recursive Feature Selection (RFE)" and "Selection by Filter (SBF)". As far as I understand it, RFE however is only supporting a handful of models in caret. So I might be forced to implement the rfeControl$functions myself.
Best Answer
I don't know of a canned command, but you could always measure how much the mean-squared error (or misclassification rate) increases when a variable is either removed or permuted.