I'm using the bigrf
R-package to analyse a dataset with ca. 50.000 observations x 120 variables, classified into two groups.
After growing a forest of 1000 trees, I investigate the importance and relationship of the 120 features in the relation to the 2 classes using, respectively, the fastimp
and interactions
functions, which produce very nice results.
However, I'm now interested in investigating the problem using 3 (or more) rather than 2 classes. In this case, the Gini variable importance calculated by fastimp
only relates to overall importance.
My question is: Is there a way to calculate a class-specific Gini variable importance, or some similar measure?
Best Answer
There are multiple way of doing so
1) visualization - you can plot the abundance/frequency of each selected feature within each group as a bar plot. I assume visually the top feature will be more abundant in one group comparing with the other groups.
2) Exhaustive way - build 3 Random Forest model on each pair of two labels. Rank the features in each combination and eventually plot the result and see if gini scores for feature x is higher on both combinations.