Solved – Interpreting partial dependence plots (marginal effects) using plotmo

boostingmachine learningrrandom forest

I'm computing partial dependence plots to visualize the relationship between predictors (including interactions) and outcome. I study patients with brain tumours and mortality is very high (90%). I use random forest (randomForest package) to perform binary classification (dead yes/no). Moreover, I use the plotmo package, which appears to be outstanding for visualizing such relationships, to visualize the variable relationships. Below is the output from plotmo, where the interaction between age and metastasis is visualized:

My questions are actually noted on the image. Using the built-in partial dependence plots in the randomForest package does not help to clarify these things). Reading the package tutorials on CRAN did not help as well.

Please refer to the image above to see what is unclear regarding this.

I'm deeply grateful for any thoughts you guys may offer.

Best Answer

About the vertical axis:

The plotmo function calls predict internally to generate the graph. So the vertical axis will be plotted in whatever units predict returns for your model.

In your example for a randomForest model, the default prediction type is "response" (see the help page for predict.randomForest). Change this by passing type="prob" to plotmo, which plotmo will pass on internally to predict.randomForest.

For example (see the two graphs on the left):

library(rpart.plot); data(ptitanic)  # for ptitanic data
dat <- ptitanic[,c("survived", "sex", "pclass")] # classic example
library(randomForest)
mod <- randomForest(survived~., data=dat)
plotmo(mod) # default predict type is "response"
plotmo(mod, type="prob")

plot

About the horizontal axes:

Since the persp function accepts only numeric (not factor) arguments , plotmo converts factor variables to numeric before invoking persp internally. Thus a factor of say "blue", "green", "red" will be plotted as 1,2,3 in plotmo's perspplot (1,2,3 are the integers used internally by R to represent the factor).

Axis labels:

Get more information on the axes by invoking persp with ticktype="detailed". To do this, pass persp.ticktype="detailed" to plotmo (any plotmo argument prefixed by persp. gets passed on internally to the persp function, this is described near the bottom of the plotmo help page).

For example (see the two graphs on the right):

plotmo(mod, type="prob", persp.ticktype="detailed")
plotmo(mod, type="prob", persp.ticktype="detailed", persp.nticks=3)

Bear in mind that plotmo does automatic determination of the response axis range. See the ylim argument of plotmo and Section 5.4 of the vignette for the plotmo package. This automatic determination can sometimes cause surprising results.

To plot partial dependence graphs, don't forget that we need to pass type="partdep" to plotmo. See Chapters 1 and 9 of the plotmo vignette.

Related Solutions

Solved – R: What do I see in partial dependence plots of gbm and RandomForest

I spent some time writing my own "partial.function-plotter" before I realized it was already bundled in the R randomForest library.

[EDIT ...but then I spent a year making the CRAN package forestFloor, which is by my opinion significantly better than classical partial dependence plots]

Partial.function plot are great in instances as this simulation example you show here, where the explaining variable do not interact with other variables. If each explaining variable contribute additively to the target-Y by some unknown function, this method is great to show that estimated hidden function. I often see such flattening in the borders of partial functions.

Some reasons: randomForsest has an argument called 'nodesize=5' which means no tree will subdivide a group of 5 members or less. Therefore each tree cannot distinguish with further precision. Bagging/bootstrapping layer of ensemple smooths by voting the many step functions of the individual trees - but only in the middle of the data region. Nearing the borders of data represented space, the 'amplitude' of the partial.function will fall. Setting nodesize=3 and/or get more observations compared to noise can reduce this border flatting effect... When signal to noise ratio falls in general in random forest the predictions scale condenses. Thus the predictions are not absolutely terms accurate, but only linearly correlated with target. You can see the a and b values as examples of and extremely low signal to noise ratio, and therefore these partial functions are very flat. It's a nice feature of random forest that you already from the range of predictions of training set can guess how well the model is performing. OOB.predictions is great also..

flattening of partial plot in regions with no data is reasonable: As random forest and CART are data driven modeling, I personally like the concept that these models do not extrapolate. Thus prediction of c=500 or c=1100 is the exactly same as c=100 or in most instances also c=98.

Here is a code example with the border flattening is reduced:

I have not tried the gbm package...

here is some illustrative code based on your eaxample...

#more observations are created...
a <- runif(5000, 1, 100)
b <- runif(5000, 1, 100)
c <- (1:5000)/50 + rnorm(100, mean = 0, sd = 0.1)
y <- (1:5000)/50 + rnorm(100, mean = 0, sd = 0.1)
par(mfrow = c(1,3))
plot(y ~ a); plot(y ~ b); plot(y ~ c)
Data <- data.frame(matrix(c(y, a, b, c), ncol = 4))
names(Data) <- c("y", "a", "b", "c")
library(randomForest)
#smaller nodesize "not as important" when there number of observartion is increased
#more tress can smooth flattening so boundery regions have best possible signal to             noise, data specific how many needed

plot.partial = function() {
partialPlot(rf.model, Data[,2:4], x.var = "a",xlim=c(1,100),ylim=c(1,100))
partialPlot(rf.model, Data[,2:4], x.var = "b",xlim=c(1,100),ylim=c(1,100))
partialPlot(rf.model, Data[,2:4], x.var = "c",xlim=c(1,100),ylim=c(1,100))
}

#worst case! : with 100 samples from Data and nodesize=30
rf.model <- randomForest(y ~ a + b + c, data = Data[sample(5000,100),],nodesize=30)
plot.partial()

#reasonble settings for least partial flattening by few observations: 100 samples and nodesize=3 and ntrees=2000
#more tress can smooth flattening so boundery regions have best possiblefidelity
rf.model <- randomForest(y ~ a + b + c, data = Data[sample(5000,100),],nodesize=5,ntress=2000)
plot.partial()

#more observations is great!
rf.model <- randomForest(y ~ a + b + c,
 data = Data[sample(5000,5000),],
 nodesize=5,ntress=2000)
plot.partial()

Solved – interpreting y axis of a partial dependence plots

Each point on the partial dependence plot is the average vote percentage in favor of the "Yes trees" class across all observations, given a fixed level of TRI.

It's not a probability of correct classification. It has absolutely nothing to do with accuracy, true negatives, and true positives.

When you see the phrase

Values greater than TRI 30 begin to have a positive influence for classification in your model

is an puffed-up way of saying

Values greater than TRI 30 begin to predict "Yes trees" more strongly than values lower than TRI 30

Best Answer

Related Solutions

Solved – R: What do I see in partial dependence plots of gbm and RandomForest

Solved – interpreting y axis of a partial dependence plots

Related Question