Solved – Random Forest: Class specific feature importance

classificationfeature selectionrrandom forest

I'm using the bigrf R-package to analyse a dataset with ca. 50.000 observations x 120 variables, classified into two groups.

After growing a forest of 1000 trees, I investigate the importance and relationship of the 120 features in the relation to the 2 classes using, respectively, the fastimp and interactions functions, which produce very nice results.

However, I'm now interested in investigating the problem using 3 (or more) rather than 2 classes. In this case, the Gini variable importance calculated by fastimp only relates to overall importance.

My question is: Is there a way to calculate a class-specific Gini variable importance, or some similar measure?

Best Answer

There are multiple way of doing so

1) visualization - you can plot the abundance/frequency of each selected feature within each group as a bar plot. I assume visually the top feature will be more abundant in one group comparing with the other groups.

2) Exhaustive way - build 3 Random Forest model on each pair of two labels. Rank the features in each combination and eventually plot the result and see if gini scores for feature x is higher on both combinations.