Solved – Why is the variable importance metric suggested by Breiman specific only to random forests

importancemachine learningrandom forest

In the Random Forest paper they describe a nice way of measuring a variable importance – take your validation data, measure error rate, permute the variable and re-measure error rate.

Question – why is that method specific to Random Forests? I understand that in other classifiers (SVM, LR, etc.) we don't have the concept of OOB, but we certainly can use a regular train-validation split.

What am I missing here? Why isn't this method a common practice?

Best Answer

Any bagged learner can produce an analogue of Random Forests importance metric.

You can't get this kind of feature importance in a common cross-validation scheme, where all the features are used all the time.

Related Solutions

Machine Learning – Understanding Out of Bag Error in Random Forest and Data Partitioning

Training a model, tuning its hyperparameters, and evaluating its performance are typically done using independent training, validation, and test sets. This three-way split can take the form of holdout or nested cross validation. The independence of these sets is important because, otherwise, estimates of the error would be downwardly biased--we'd select poor models and expect them to perform better on future data than they really would. Because random forests already use bootstrapping for fitting individual tries, they readily yield the out-of-bag (OOB) error. This is an unbiased estimate of the error on future data. As such, it can take the place of the validation or test error, and is cheaper to compute than using nested cross validation.

If we had a fixed set of hyperparameters, we could train a random forest on the entire dataset, estimate performance using the OOB error, and call it a day. But, random forests have hyperparameters that may need to be tuned to balance between under- and overfitting. One of these is the number of features considered for each split. Another is tree size, which is typically controlled by limiting the depth or number of nodes when growing the tree, rather than by pruning after the fact. Rather than splitting the data into training, validation, and test sets, we can use the OOB error in place of the the validation or test set error. For example, hyperparameters could be tuned to minimize OOB error and performance could be evaluated on the test set (possibly using cross validation, with no need for nesting).

Solved – Per cent increase in MSE (%IncMSE) random forests importance measure: why is mean prediction error divided by standard deviation

It seems analogous to the computation of an effect size. It reflects the mean increase in MSE the variable contributes, divided by a measure of its variability:

For each tree, we get a difference between two MSE values. Averaging over trees gives the mean difference between the two MSE values.

The standard deviation of the differences reflects the variation around the mean, a measure of residual error (cf. pooled standard deviation in ANOVA). Dividing the mean by this standard deviation gives an effect size (cf. Cohen's $d$ in ANOVA).

Note that it would thus be possible to obtain MSE increase > 100% (just like it is possible to obtain Cohen's $d$ effect size > 1; %IncMSE increase should not be interpreted as $R^2$):

> set.seed(42)
> x <- rnorm(1000)
> y <- x + rnorm(1000, sd = .01)
> library(randomForest)
> rf <- randomForest(x ~ y, importance = TRUE)
> importance(rf)
   %IncMSE IncNodePurity
y 293.7449      1004.795

Best Answer

Related Solutions

Machine Learning – Understanding Out of Bag Error in Random Forest and Data Partitioning

Solved – Per cent increase in MSE (%IncMSE) random forests importance measure: why is mean prediction error divided by standard deviation

Related Question