Solved – What calculation does XGBoost use for feature importances

boostinginterpretation

Does anyone know what the actual calculation behind the feature importance (importance type='gain') method in the xgboost library is? I looked through the documentation and also consulted some other pages but I couldn't find an exact reference on what the actual calculation behind the measures is.

I would be glad for any kind of scientific references of the calculation method as I'd like to cite it.

Best Answer

Like with random forests, there are different ways to compute the feature importance. In XGBoost, which is a particular package that implements gradient boosted trees, they offer the following ways for computing feature importance:

How the importance is calculated: either “weight”, “gain”, or “cover”
- ”weight” is the number of times a feature appears in a tree
- ”gain” is the average gain of splits which use the feature
- ”cover” is the average coverage of splits which use the feature where coverage is defined as the number of samples affected by the split

(Source: https://xgboost.readthedocs.io/en/latest/python/python_api.html)

Now, the gain is basically just the information gain averaged over all trees. For that, given a node in the tree, you first compute the node impurity of the parent node -- e.g., using Gini or entropy as a criterion. Then, you compute the node impurities of the child nodes if you were to use a given feature for the split. Finally, the information gain is calculated by subtracting the child impurities from the parent node impurity. Let me know if you need more details on that.

Related Question