I have trained a random forest classifier using the sklearn Python package, and used it to classify a datapoint with a certain feature vector.
Let's assume that the random forest has only one tree, that this is a binary classification task, and the data point has been labeled as class '0', while I was expecting it to be '1'. How can I check which features were responsible for such classification? Is there a way to get the list of split-thresholds for each feature?
How can this be generalised to the multiclass case, with multiple trees?
Best Answer
In the canonical implementation of random forest (R's
randomForest
package), there is a way to produce a local importance matrix that tells you which feature(s) have contributed to the model's prediction.The rows of locImp are the features, columns the observations. So locImp[,1] gives,
That says Petal.Width has the most weight in predicting
setosa
on the first observation.