Solved – Do random forest variable importance measures take into account the interactions

importancerandom forest

Do random forest measures of variable importance (mean change of accuracy, mean change of Gini index) take the interactions into account? I think I know how we come up with the variable importance plot (by permuting each of the predictors), and it doesn't seem that random forest captures the interaction. Does anybody have another point of view? Thanks.

Best Answer

The variable importance obtained by permutations is computed only by permuting values for a single variable. Thus, it computes some importance measure of the given variable in the context that all other data is fixed. I think it is reasonable to state that the importance measure includes in the measurement also interactions, if such interactions exists. I mean that I see VI as an impure measure, a measure influenced by the main effect of that variable and also interaction with others.

Gini importance is found often to be in concordance with permutation importance, and I see it as a similar measure.

There is however something called interaction which is measured in random forests, and this measures if a split on a given variable increase or decrease splits on other measure. This can be computed for each pair of measures. It looks like a 2 measure interactions. If one want to measure interactions with more than 2 variables than I suppose it is possible extending the given procedure, but soon becomes too computer intensive.

Last thing called interactions is not implemented in R package randomForests as far as I know. Take a look on the brief description from the Breiman's page on RF here, and check for Interactions section.

Related Question