I am running a tweedie regression and I need to get the dispersion parameter(phi
).
Here a simple R code:
library(GLMsData)
library(tweedie)
library(statmod)
set.seed(999)
data(quilpie)
out <- tweedie.profile(Rain ~ Phase, do.plot=TRUE, data=quilpie, link.power = 0)
model <- glm(Rain ~ Phase, data=quilpie, family=tweedie(var.power=out$xi.max, link.power=0))
out$phi.max
summary(model)$dispersion
First I am finding the optimal var.power
coefficient and then fitting the model.
What I don't understand is why the phi.max
estimate and the dispersion
estimate are not the same?
What am I missing here?
Best Answer
For the benefit of other readers, Tweedie glms assume that the variance of the responses has the form $$ {\rm var}(y_i) = \phi \mu_i^\xi $$ where $\phi$ is the dispersion and $\xi$ is the variance power.
tweedie.profile
returns maximum likelihood estimators for $\phi$ and $\xi$.The dispersion estimator returned by
glm
is different becauseit is a Pearson estimator based on squared residuals rather than a maximum likelihood estimator and
it is adjusted for the fact that the linear model regression parameters have to be estimated, which the maximum likelihood estimator is not.
The dispersion estimator returned by
glm
is more conservative and will generally be slightly larger than that fromtweedie.profile
, especially when the sample size is small or you have a large number of covariates in the linear model. You can forceglm
to use the maximum likelihood estimate if you want bybut we recommend that you use the more conservative Pearson estimator from
glm
.The different dispersion estimators are explained in detail in my book with Peter Dunn:
Dunn, PK, and Smyth, GK (2018). Generalized linear models with examples in R. Springer, New York, NY.