Solved – Importance of variables in Decission trees

cartpythonscikit learn

I'm training decission trees for a project in which I want to predict the behavior of one variable according to the others (there are about 20 other variables).

Some time ago I was using simple logistic regression models in another project (using R). There is a nice feature in R where you can see the statistical significance of every variable introduced in the model. This helps in simplifying the model by removing not meaningful variables.

I wonder if there is a way to do the same with Decission trees (this time I'm using Python and scikit-learn). Can we see which variables are really important for a trained model in a simple way?

Thanks!

Best Answer

You can use the following method to get the feature importance. First of all built your classifier.

clf= DecisionTreeClassifier()

now

clf.feature_importances_

will give you the desired results.

The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.