I was interested in knowing if anyone is using the custom made function of BRT by Elith et al. (2008) in Journal of Animal Ecology "A working guide to boosted regression trees" and knows what does tolerance = fixed
or tolerance = auto
does in the following function:
This function is found in the dismo
package in R.
https://cran.r-project.org/web/packages/dismo/dismo.pdf
?dismo::gbm.step
mdl <- gbm.step(data=df1,
gbm.x = 4:7,
gbm.y = 16,
family = "gaussian",
tree.complexity = 1,
learning.rate = 0.1,
bag.fraction = 0.5,
tolerance.method = "fixed",
tolerance = 0.1)
Best Answer
The parameter
tolerance.method
determines how the tolerance threshold (which is used to determine the optimal number of trees) shall be calculated.Let's look at the paper and the source code to first understand the purpose of the tolerance threshold, and then how is it calculated.
Purpose of the tolerance threshold
The paper (Elith, Leathwick & Hastie 2008) states (p. 807) that the function
gbm.step
implements cross-validation to determine the optimal number of trees (as detailed in Figure 4, which I will paste here since the PDF treats it as an image, not text.)The tolerance threshold helps determine when "the average of the more recent set is higher than the average of the previous set" (step 5).
The source code for
gbm.step
(line 159) shows that the algorithm will continue to build treeswhile (delta.deviance > tolerance.test & n.fitted < max.trees)
.tolerance.test
is the tolerance thresholddelta.deviance
is defined as default of1
(line 150), which will never fall below the tolerance threshold, but when at least 20 trees have been built, then:(on line 220)
In other words, the reduction in means of loss functions by the most recent 10 iterations as compared to the 10 iterations before that.
It's worth noting the apparent discrepancy in the source code (v 1.1.1) which compares the current to 10th previous iterations against the 11th-20th iterations, and step 5 in the figure, which compares the current to 5th against the 6th-10th. So the code is a little more conservative in that it uses a larger window for averaging the loss function.
How is the tolerance threshold calculated?
By default,
tolerance.method
=auto andtolerance=0.001
.On line 77:
Since there is no corresponding adjustment for
tolerance.method == "fixed"
, the algorithm would use the default or user-provided argument without adjustment.So you can specify
tolerance.test
as an absolute deviance (viatolerance.method='fixed'
) or relative to the mean total deviance (viatolerance.method='auto'
).