Solved – Partial Dependence plot interpretation for Categorical variables

data visualizationnon-independentpartial-effect

I am using partial dependence plot from random forest. The partial plot doesn't make sense to me. 10th completed people have only 62 out of 933 people as 1. But the partial plot shows positive bar, while doctorate have 3/4 of the population under 1 and partial plot shows negative bar.

enter image description here

enter image description here

Data: Count falling under each category of education

Best Answer

Partial plots don't have to indicate in the same direction of the data univariately, in fact, this is what makes them useful.

Partial plots are showing you the marginal effect of just this variable. It is likely that there are predictors in your dataset heavily correlated with Education=10th and Education=Doctorate that already account for the univariate effect. Once that effect is controlled for, Education=Doctorate really does reduce your propensity to be whatever your IV is.

Here's a contrived example. Imagine we're trying to predict drinks_coffee, and have data like this:

education  likes_coffee  drinks_coffee
     10th             1              1
     10th             0              0
     10th             0              0
     10th             0              0
Doctorate             1              1
Doctorate             1              1
Doctorate             1              0 *
Doctorate             0              0

Univariately, education=Doctorate seems to imply greater propensity to drink coffee. However, if we include likes_coffee in a model, the effect of having education=Doctorate actually decreases your propensity to drink coffee. likes_coffee soaks up the overwhelming majority of the signal, but it's only possible to like coffee and not drink it you have a Doctorate (starred row).

Does education come high in relative influence? Are there other big predictors that could be explaining the massive univariate difference? Of course, it's always possible your model has a bug in it.