Solved – Partial Dependence Plot interpretation

partial plot

Partial Dependence PlotThis question regarding the partial dependence plots obtained from random forest model. I am trying to do classification modelling. I see negative probability in the y-axis . How to interpret this?

Best Answer

If this came from R, then these are logits of probabilities, not raw probabilities.

As per the documentation:

The function being plotted is defined as: $$\tilde{f}(x) = \frac{1}{n} \sum_{i=1}^n f(x, x_{iC}),$$ where $x$ is the variable for which partial dependence is sought, and $x_{iC}$ is the other variables in the data. The summand is the predicted regression function for regression, and logits (i.e., log of fraction of votes) for which.class for classification: $$f(x) = \log p_k(x) - \frac{1}{K} \sum_{j=1}^K \log p_j(x),$$ where $K$ is the number of classes, $k$ is which.class, and $p_j$ is the proportion of votes for class $j$.

(http://www.rdocumentation.org/packages/randomForest/versions/4.6-12/topics/partialPlot)

It is perfectly fine to have negative values, or for that matter values greater than 1. Recall what the logit function looks like:

(https://en.wikipedia.org/wiki/Logit#/media/File:Logit.svg)

Negative values correspond to probabilities (or in this case proportions) less than 0.5.