Solved – how to calculate partial dependence when I have 4 predictors

partial-effectrandom forest

I was reading Freidman's book "The elements of statistical learning-2nd edition". Page 365, it talks about partial dependence plot. I don't quite understand how he actually calculates partial depence of f(X) on Xs. Say, I built a model on 4 predictors; I use the model itself as f(Xs,Xc). I want to know the partial dependence of my first predictor V1. If the data looks like :

V1 V2 V3 V4 predicted_p

1 1 0 1 0.2

1 2 0 0 0.24

2 1 1 1 0.6

2 2 0 1 0.4

Is the partial dependence for V1 has two values, one for V1=1, the other for V1=2?
For case V1=1, PD=(0.2+0.24)/2=0.22; for case V1=2, PD=(0.6+0.4)/2=0.5

Do I understand right? Thank you.

Besides, are the mean of partial dependence for each predictor equal?

Best Answer

Suppose that we have a data set $X = [x_s \, x_c] \in \mathbb R^{n \times p}$ where $x_s$ is a matrix of variables we want to know the partial dependencies for and $x_c$ is a matrix of the remaining predictors. Let $y \in \mathbb R$ be a vector of responses (i.e. a regression problem). Suppose that $y = f(x) + \epsilon$ and we estimate some fit $\hat f$.

Then $\hat f_s (x)$, the partial dependence of $\hat f$ at $x$ (here $x$ lives in the same space as $x_s$), is defined as:

$$\hat f_s(x) = {1 \over n} \sum_{i=1}^n \hat f(x, x_{c_i})$$

This says: hold $x$ constant for the variables of interest and take the average prediction over all other combinations of other variables in the training set. So we need to pick variables of interest, and also to pick a region of the space that $x_s$ lives in that we are interested in. Note: be careful extrapolating the marginal mean of $f(x)$ outside of this region.

Here's an example implementation in R. We start by creating an example dataset:

library(tidyverse)
library(ranger)
library(broom)

mt2 <- mtcars %>%
  as_tibble() %>%
  select(hp, mpg, disp, wt, qsec)

Then we estimate $f$ using a random forest:

fit <- ranger(hp ~ ., mt2)

Next we pick the feature we're interested in estimating partial dependencies for:

var <- quo(disp)

Now we can split the dataset into this predictor and other predictors:

x_s <- select(mt2, !!var)   # grid where we want partial dependencies
x_c <- select(mt2, -!!var)  # other predictors

Then we create a dataframe of all combinations of these datasets:

# if the training dataset is large, use a subsample of x_c instead
grid <- crossing(x_s, x_c)

We want to know the predictions of $\hat f$ at each point on this grid. I define a helper in the spirit of broom::augment() for this:

augment.ranger <- function(x, newdata) {
  newdata <- as_tibble(newdata)
  mutate(newdata, .fitted = predict(x, newdata)$predictions)
}

au <- augment(fit, grid)

Now we have the predictions and we marginalize by taking the average for each point in $x_s$:

pd <- au %>%
  group_by(!!var) %>%
  summarize(yhat = mean(.fitted))

We can visualize this as well:

pd %>%
  ggplot(aes(!!var, yhat)) +
  geom_line(size = 1) +
  labs(title = "Partial dependence plot for displacement",
       y = "Average prediction across all other predictors",
       x = "Engine displacement") +
  theme_bw()

Finally, we can check this implementation against the pdp package to make sure it's correct:

pd2 <- pdp::partial(
  fit,
  pred.var = quo_name(var),
  pred.grid = distinct(mtcars, !!var),
  train = mt2
)

testthat::expect_equivalent(pd, pd2)  # silent, so we're good

For a classification problem, you can repeat a similar procedure, except predicting the class probability for a single class instead.

Related Solutions

Solved – interpreting y axis of a partial dependence plots

Each point on the partial dependence plot is the average vote percentage in favor of the "Yes trees" class across all observations, given a fixed level of TRI.

It's not a probability of correct classification. It has absolutely nothing to do with accuracy, true negatives, and true positives.

When you see the phrase

Values greater than TRI 30 begin to have a positive influence for classification in your model

is an puffed-up way of saying

Values greater than TRI 30 begin to predict "Yes trees" more strongly than values lower than TRI 30

Solved – how to plot 3D partial dependence in GBM

You can use the R function persp.

Here is an example using diabetes dataset along with the function reshape2::acast to convert a three columns dataframe into a matrix of desired dimension.

We represent the partial dependence plot of the variables age and sex.

library(gbm)
library(reshape2)
data(diabetes, package = 'lars')

y        <- diabetes$y
x        <- diabetes$x
class(x) <- 'matrix'
data     <- data.frame(y, as.data.frame(x))

gbm.model <- gbm::gbm(formula = y ~ . , data = data, distribution = 'gaussian', 
                 shrinkage = 1, bag.fraction = 1, n.trees = 100,
                 interaction.depth = 3, verbose = T, keep.data = F)


partial <- plot(gbm.model, i.var = c(1,2), return.grid = T)

colnames(partial)

mat <- reshape2::acast(data = partial, formula = age ~ sex, value.var = 'y')

persp(x = as.numeric(colnames(mat)), y = as.numeric(rownames(mat)), z=mat,
      zlab = 'partial dependence', xlab = 'sex', ylab = 'age', theta = 30)

We obtain the following plot :

Best Answer

Related Solutions

Solved – interpreting y axis of a partial dependence plots

Solved – how to plot 3D partial dependence in GBM

Related Question