Solved – variable importance in boosted regression tree

boostingcartrpart

I have trouble understanding how relative influence of a variable is calculated in a boosted regression tree. I am reading from the following paper by Friedman and Meulman.

Multiple additive regression trees with application in epidemiology
http://onlinelibrary.wiley.com/doi/10.1002/sim.1501/pdf

"The relative contribution of any one explanatory variable ($x_j$) is based on how often it is selected to split individual trees, weighted by the squared improvement to the model ($I_j^2$) resulting from the sum of these trees (i.e. from $m = 1$ to $M$ the total number of trees):

$$\hat I_j^2 = \frac{1}{M} \sum_{m=1}^M I_j^2(Tm)$$

where $I_j^2$ is the relative influence of input variable $j$ for individual tree $Tm$

I do not understand how the term $I_j^2$ (which is the squared improvement of the model) is calculated for each tree. Can anyone please explain me this.

Best Answer

This is what the paper says:

"As noted above, all of the input predictor variables are seldom equally relevant for prediction. Often only a few of them have substantial influence on the response; the vast majority are irrelevant and could just as well have not been measured. It is often useful to learn the relative importance or contribution of each input variable in predicting the response. For a single tree T, Breiman et al. [1] proposed a measure of (squared) relevance of your measure for each predictor variable xj, based on the number of times that variable was selected for splitting in the tree weighted by the squared improvement to the model as a result of each of those splits. This importance measure is easily generalized to additive tree expansions (3); it is simply averaged over the trees."

So if you can get the estimate, then you average it over all trees. So how many time was it used in a tree and its influence on improvement, depending on how that is being measured (e.g., accuracy). Have you had a look at the Breiman paper?

Breiman L, Friedman JH, Olshen R, Stone C. Classication and Regression Trees . Wadsworth: Pacic Grove, 1984