R – Scaling vs Offsetting in Quasi-Poisson Generalized Linear Model (GLM)

generalized linear modeloffsetpoisson distributionquasi-likelihoodr

I recently modeled insurance claim frequency assuming a Quasi-Poisson distribution in R. My frequency dependent variable was calculated, in advance of modeling, as the number of claims per underlying exposure. Someone commented that it would be incorrect to do this, and that the correct method was to model the observed claim counts and use the underlying exposure as an offset.

Can someone explain the theoretical and practical differences between scaling and offsetting? I ran the code both ways and noticed that the deviance residuals are significantly smaller when using offsets, but otherwise there were no material differences in variable selection.

P.S. Note the similar but still different question here.

Best Answer

We can consider the GLM as having two components, model for the mean and a model for the variance. This is even more explicit with the quasi-GLM case.

The mean is assumed proportional to the exposure; with a log-link (which is what I presume you have), you could try to adjust for the effect of exposure on the mean either by dividing the data by exposure or by using an offset of log-exposure. Both have the same effect on the mean.

However, depending on the particular distribution that's operating*, they can have different effects on the variance.

*(as well as other drivers like dependence and unmodelled effects)

When you divide by exposure you divide the variance by exposure-squared (this is just a basic variance property - $\text{Var}(\frac{X}{e_i})=\frac{1}{e_i^2} \text{Var}(X)$). Equivalently, scaling by exposure reduces the standard deviation in proportion to the mean (leaving the coefficient of variation constant). This might suit claim amounts but doesn't fit with a quasi-Poisson model for claim counts.

[For example, a model for aggregate claim payments might consider a Gamma GLM (which has variance proportional to mean squared, or constant coefficient of variation) having an offset of log-exposure will reduce the fitted mean by a factor of exposure and so (because the model has variance proportional to mean-squared) will reduce variance by the square of exposure. So for a Gamma GLM with log-link the two approaches are identical; this is also true for other models where your model for the mean is proportional to a scale parameter and the variance is proportional to the square of the mean, including lognormal models, Weibull models and a number of others.]

For a quasi-Poisson GLM with log link, in the model, the variance is proportional to mean, not mean squared. As such, when you fit log-exposure as an offset it reduces fitted variance according to the model - proportional to the change in mean. As we saw above, when you divide by exposure you change it according to mean-squared.

If the quasi-Poisson model was actually the correct model for your counts, then you should certainly use an offset of log-exposure, since it would describe the impact on variance correctly as Ben indicated.


However, for claim counts, a quasi-Poisson model is at best a rough approximation.

If you have heterogeneity, a negative binomial would tend to model the variability better, and it doesn't have variance proportional to mean; however often it, too doesn't really capture the variance effect -- some important drivers of claim frequency may lead to even stronger relationship to the mean.

Realistically, exposure won't exactly impact the variance in proportion to the mean. Many effects we're aware of will work to make that contribution to the variance increase somewhat faster than the mean does.

For counts, the variance assumption in the quasi-Poisson model will at least sometimes be close to correct; if your model is quasi Poisson, then you'll certainly get the variance wrong (according to your model) if you divide by exposure.

You could make an assessment of whether variance is well approximated as proportional to mean at model fitting time by considering the usual model diagnostics (if it isn't, you should not be using a model that says it is; if it is, then you should deal with exposure properly according to your model).

[Of course, exposure may not impact the variance in the model the same way as the rest of the drivers tend to, but that might be introducing more complexity than you have data to deal with.]

Related Question