Solved – In a random forest algorithm, how can one intrepret the importance of each feature

importanceMATLABrandom forest

I am in the process of building a Random Forest algorithm in MATLAB using the TreeBagger function. In the documentation, it returns 3 parameters about the importance of the input features. My question is: How can I interpret these values? The DeltaError (2nd one) appears to make sense, but I don't have a firm grasp of what "classification margin", "raised margins", and "lowered margins" mean in the following descriptions.

OOBPermutedPredictorCountRaiseMargin

A numeric array of size 1-by-Nvars containing a measure of variable importance for each predictor variable (feature). For any variable, the measure is the difference between the number of raised margins and the number of lowered margins if the values of that variable are permuted across the out-of-bag observations. This measure is computed for every tree, then averaged over the entire ensemble and divided by the standard deviation over the entire ensemble. This property is empty for regression trees.

OOBPermutedPredictorDeltaError

A numeric array of size 1-by-Nvars containing a measure of importance for each predictor variable (feature). For any variable, the measure is the increase in prediction error if the values of that variable are permuted across the out-of-bag observations. This measure is computed for every tree, then averaged over the entire ensemble and divided by the standard deviation over the entire ensemble.

OOBPermutedPredictorDeltaMeanMargin

A numeric array of size 1-by-Nvars containing a measure of importance for each predictor variable (feature). For any variable, the measure is the decrease in the classification margin if the values of that variable are permuted across the out-of-bag observations. This measure is computed for every tree, then averaged over the entire ensemble and divided by the standard deviation over the entire ensemble. This property is empty for regression trees.

Best Answer

The classification margin is the distance between the predicted probability for the true class and the highest probability predicted out of all the negative classes. A good classifier will have a large margin. A predictor raises the margin if including it increases the difference in predicted probability for the true class and the highest negative class. If that difference decreases then the margin is lowered. The intuition is that a useful feature will raise more margins than it lowers.