Solved – Random forest regression produce different importance ranking

importancerrandom forestregression

I've been working a random forest regression model with my own datasets using the randomForest package.

The model produces a % var explained of ~27%. I also used the importance() function to produce a ranking of the relative importance of the variables, but everytime I use a different number for the set.seed() function, the random forest model produces a different ranking. It should be mentioned that I have several variables whose importance are relatively close.

I am wondering is there a consistent way to produce the ranking of the variable importance?

Best Answer

No- no consistent variable importance value is possible in this case.

The variable importance changes because the underlying model itself changes each time you change the seed value. Different models will have different values for variable importance.

One thing that you could do would be run many (say, 10) Random Forest models with different seed values and average the variable importance scores across the models- this would get you a better approximation of what you could expect variable importance to be on average from each individual Random Forest model you train on the dataset.

Related Question