I'm training decission trees for a project in which I want to predict the behavior of one variable according to the others (there are about 20 other variables).
Some time ago I was using simple logistic regression models in another project (using R). There is a nice feature in R where you can see the statistical significance of every variable introduced in the model. This helps in simplifying the model by removing not meaningful variables.
I wonder if there is a way to do the same with Decission trees (this time I'm using Python and scikit-learn). Can we see which variables are really important for a trained model in a simple way?
Thanks!
Best Answer
You can use the following method to get the feature importance. First of all built your classifier.
now
will give you the desired results.
The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.