GLM Tweedie Distribution – How to Determine the Tweedie Dispersion Parameter in Generalized Linear Models

dispersiongeneralized linear modeltweedie-distribution

I am running a tweedie regression and I need to get the dispersion parameter(phi).
Here a simple R code:

library(GLMsData)
library(tweedie)
library(statmod)

set.seed(999)
data(quilpie)

out <- tweedie.profile(Rain ~ Phase, do.plot=TRUE, data=quilpie, link.power = 0)
model <- glm(Rain ~ Phase, data=quilpie, family=tweedie(var.power=out$xi.max, link.power=0))

out$phi.max
summary(model)$dispersion

First I am finding the optimal var.power coefficient and then fitting the model.
What I don't understand is why the phi.max estimate and the dispersion estimate are not the same?
What am I missing here?

Best Answer

For the benefit of other readers, Tweedie glms assume that the variance of the responses has the form $$ {\rm var}(y_i) = \phi \mu_i^\xi $$ where $\phi$ is the dispersion and $\xi$ is the variance power.

tweedie.profile returns maximum likelihood estimators for $\phi$ and $\xi$.

The dispersion estimator returned by glm is different because

  1. it is a Pearson estimator based on squared residuals rather than a maximum likelihood estimator and

  2. it is adjusted for the fact that the linear model regression parameters have to be estimated, which the maximum likelihood estimator is not.

The dispersion estimator returned by glm is more conservative and will generally be slightly larger than that from tweedie.profile, especially when the sample size is small or you have a large number of covariates in the linear model. You can force glm to use the maximum likelihood estimate if you want by

summary(model, dispersion=out$phi.max)
anova(model, dispersion=out$phi.max)

but we recommend that you use the more conservative Pearson estimator from glm.

The different dispersion estimators are explained in detail in my book with Peter Dunn:

Dunn, PK, and Smyth, GK (2018). Generalized linear models with examples in R. Springer, New York, NY.