I am performing a predictive modeling application where I have to predict claims. If I had used classical GLMs, I would have used a poisson glm using log exposure as offset, assuming therefore $$\text{claims} = \text{exposure} \cdot \exp \left( x^T \beta \right),$$ assuming that claims are proportional to the exposure and therefore allowing for covariate dependency. I want to use ctree or rpart or other tree based approaches. Is it possible to handle prior offset in such models in some way?
R – Using Regression Trees to Model Rates: A Comprehensive Guide
cartoffsetrrpart
Best Answer
One way would be to adopt a formal model-based tree. The
glmtree()
function in thepartykit
package implements the general MOB algorithm for model-based recursive partitioning (Zeileis et al. 2008, Journal of Computational and Graphical Statistics, 17(2), 492-514). This supports Poisson responses and also allows for the inclusion of offsets. Furthermore, additional regressors could be included in each of the terminal nodes.Consider the following simple artificial example:
This uses two simple partitioning variables (
x1
andx2
) and anexposure
variable. The response is Poisson-distributed with offsetlog(exposure)
and mean1 = exp(0)
except for the case when bothx1 > 0.5 & x2 > 0.5
where the mean isexp(1)
Then
glmtree()
can fit a Poisson GLM-based tree forclaims
withoffset(log(exposure))
and partitioning variablesx1 + x2
.This captures the true tree structure (which is admittedly easy to find here) and correctly estimates the intercepts (with the default log-link):
You can also obtain more detailed information about each fitted GLM in the nodes of the tree, e.g., for the last node:
More details and references are provided in
vignette("mob", package = "partykit")
.