Solved – Tolerance in boosted regression trees

boostingcartrsupervised learning

I was interested in knowing if anyone is using the custom made function of BRT by Elith et al. (2008) in Journal of Animal Ecology "A working guide to boosted regression trees" and knows what does tolerance = fixed or tolerance = auto does in the following function:

This function is found in the dismo package in R.

https://cran.r-project.org/web/packages/dismo/dismo.pdf

?dismo::gbm.step

mdl <- gbm.step(data=df1,
                gbm.x = 4:7,
                gbm.y = 16,
                family = "gaussian", 
                tree.complexity = 1,
                learning.rate = 0.1,
                bag.fraction = 0.5,
                tolerance.method = "fixed",
                tolerance = 0.1)

Best Answer

The parameter tolerance.method determines how the tolerance threshold (which is used to determine the optimal number of trees) shall be calculated.

Let's look at the paper and the source code to first understand the purpose of the tolerance threshold, and then how is it calculated.

Purpose of the tolerance threshold

The paper (Elith, Leathwick & Hastie 2008) states (p. 807) that the function gbm.step implements cross-validation to determine the optimal number of trees (as detailed in Figure 4, which I will paste here since the PDF treats it as an image, not text.)

enter image description here

The tolerance threshold helps determine when "the average of the more recent set is higher than the average of the previous set" (step 5).

The source code for gbm.step (line 159) shows that the algorithm will continue to build trees while (delta.deviance > tolerance.test & n.fitted < max.trees).

  • tolerance.test is the tolerance threshold
  • delta.deviance is defined as default of 1 (line 150), which will never fall below the tolerance threshold, but when at least 20 trees have been built, then:

(on line 220)

 if (j >= 20) {
   test1 <- mean(cv.loss.values[(j - 9):j])
   test2 <- mean(cv.loss.values[(j - 19):(j - 9)])
   delta.deviance <- test2 - test1
 }

In other words, the reduction in means of loss functions by the most recent 10 iterations as compared to the 10 iterations before that.

It's worth noting the apparent discrepancy in the source code (v 1.1.1) which compares the current to 10th previous iterations against the 11th-20th iterations, and step 5 in the figure, which compares the current to 5th against the 6th-10th. So the code is a little more conservative in that it uses a larger window for averaging the loss function.

How is the tolerance threshold calculated?

By default, tolerance.method=auto and tolerance=0.001.

On line 77:

  mean.total.deviance <- total.deviance/n.cases
  tolerance.test <- tolerance
  if (tolerance.method == "auto") {
    tolerance.test <- mean.total.deviance * tolerance
  }

Since there is no corresponding adjustment for tolerance.method == "fixed", the algorithm would use the default or user-provided argument without adjustment.

So you can specify tolerance.test as an absolute deviance (via tolerance.method='fixed') or relative to the mean total deviance (via tolerance.method='auto').

Related Question