Solved – How to modify default parameters of a gbm.step plot

boostingdata visualizationr

I am using the function gbm.step() from the dismo package to assess the optimal number of boosting trees using k-fold cross validation for a clasification problem. This function fits a model with the optimal number of trees, returning it as a gbm model along with additional
information from the cross-validation selection process. And also shows a plot with default parameters.

This is my plot:

enter image description here

How can I change the default settings? That is, I want a different title and different x and y labels. After googling and reading the dismo package information I didn't find anything.

Thanks.

Best Answer

If you check the source (tar.gz), you can see how the plot is made by gbm.step. Most of the settings, like the labels and colors, are hard-coded. But it's possible to suppress the generated plot and make your own from the result.

    y.bar <- min(cv.loss.values) 
    ...

    y.min <- min(cv.loss.values - cv.loss.ses)
    y.max <- max(cv.loss.values + cv.loss.ses)

    if (plot.folds) {
      y.min <- min(cv.loss.matrix)
      y.max <- max(cv.loss.matrix) }

      plot(trees.fitted, cv.loss.values, type = 'l', ylab = "holdout deviance", xlab = "no. of trees", ylim = c(y.min,y.max), ...)
      abline(h = y.bar, col = 2)

      lines(trees.fitted, cv.loss.values + cv.loss.ses, lty=2)  
      lines(trees.fitted, cv.loss.values - cv.loss.ses, lty=2)  

      if (plot.folds) {
        for (i in 1:n.folds) {
          lines(trees.fitted, cv.loss.matrix[i,],lty = 3)
      }
    }
  }
  target.trees <- trees.fitted[match(TRUE,cv.loss.values == y.bar)]

  if(plot.main) {
    abline(v = target.trees, col=3)
    title(paste(sp.name,", d - ",tree.complexity,", lr - ",learning.rate, sep=""))
  }

Fortunately, most of the variables in the above code are returned as members of the result object, sometimes with slightly different names (notably, cv.loss.values -> cv.values).

Here's an example of calling gbm.step with main.plot=FALSE to suppress the built-in plot and creating the plot from the result object.

data(Anguilla_train)
m <- gbm.step(data=Anguilla_train, gbm.x = 3:14, gbm.y = 2, family = "bernoulli",tree.complexity = 5, learning.rate = 0.01, bag.fraction = 0.5, plot.main=F)

y.bar <- min(m$cv.values) 
y.min <- min(m$cv.values - m$cv.loss.ses)
y.max <- max(m$cv.values + m$cv.loss.ses)

plot(m$trees.fitted, m$cv.values, type = 'l', ylab = "My Dev", xlab = "My Count", ylim = c(y.min,y.max))
abline(h = y.bar, col = 3)

lines(m$trees.fitted, m$cv.values + m$cv.loss.ses, lty=2)  
lines(m$trees.fitted, m$cv.values - m$cv.loss.ses, lty=2)  

target.trees <- m$trees.fitted[match(TRUE,m$cv.values == y.bar)]
abline(v = target.trees, col=4)
title("My Title")

enter image description here