GLM Tweedie Distribution – How to Determine the Tweedie Dispersion Parameter in Generalized Linear Models

dispersiongeneralized linear modeltweedie-distribution

I am running a tweedie regression and I need to get the dispersion parameter(phi).
Here a simple R code:

library(GLMsData)
library(tweedie)
library(statmod)

set.seed(999)
data(quilpie)

out <- tweedie.profile(Rain ~ Phase, do.plot=TRUE, data=quilpie, link.power = 0)
model <- glm(Rain ~ Phase, data=quilpie, family=tweedie(var.power=out$xi.max, link.power=0))

out$phi.max
summary(model)$dispersion

First I am finding the optimal var.power coefficient and then fitting the model.
What I don't understand is why the phi.max estimate and the dispersion estimate are not the same?
What am I missing here?

Best Answer

For the benefit of other readers, Tweedie glms assume that the variance of the responses has the form $$ {\rm var}(y_i) = \phi \mu_i^\xi $$ where $\phi$ is the dispersion and $\xi$ is the variance power.

tweedie.profile returns maximum likelihood estimators for $\phi$ and $\xi$.

The dispersion estimator returned by glm is different because

it is a Pearson estimator based on squared residuals rather than a maximum likelihood estimator and
it is adjusted for the fact that the linear model regression parameters have to be estimated, which the maximum likelihood estimator is not.

The dispersion estimator returned by glm is more conservative and will generally be slightly larger than that from tweedie.profile, especially when the sample size is small or you have a large number of covariates in the linear model. You can force glm to use the maximum likelihood estimate if you want by

summary(model, dispersion=out$phi.max)
anova(model, dispersion=out$phi.max)

but we recommend that you use the more conservative Pearson estimator from glm.

The different dispersion estimators are explained in detail in my book with Peter Dunn:

Dunn, PK, and Smyth, GK (2018). Generalized linear models with examples in R. Springer, New York, NY.

Related Solutions

Solved – Tweedie p parameter Interpretation

The Generalized Linear Models with Examples in R book by Peter Dunn and Gordon Smyth contains an illuminating discussion of Tweedie distributions.

If I maybe so blunt, and summarize your excellent question:

What is the relation of $p$ and the underlying Poisson-Gamma model for a Tweedie distribution with $1 < p < 2$?

As you already note in your question, the Tweedie distribution with $1 < p < 2$ can be understood as a Poisson-Gamma model. To make it more concrete what this means, let's assume that

$$ N \sim \text{Pois}(\lambda^{*}) $$ and $$ z_i \sim \text{Gam}(\mu^{*}, \phi^{*}) $$ the observed $y$ is $$ y = \sum_{i = 1}^{N}{z_i}. $$ Dunn & Smyth give an example for this model, where $N$ is the number of insurance claims and $z_i$ is the average cost for each claim. In that case the model would describe the total insurance payout.

The relation of $p$ to the parameters of the Poisson and Gamma distribution is

\begin{equation} \begin{aligned} \lambda^{*} &= \frac{\mu^{2-p}}{\phi (2-p)} \\ \mu^{*} &= (2 - p)\phi\mu^{p - 1} \\ \phi^{*} &= (2 - p)(p - 1) \phi^2\mu^{2(p - 1)}, \end{aligned} \end{equation} where $\mu$ and $\phi$ are the mean and overdispersion parameters from the generalized linear model definition.

Solved – How to calculate the Tweedie prediction based on model coefficients

When you pass glm() the tweedie family the return value is a glm object. So you can use the predict() method or the predict.glm() method if you prefer to specify to any future readers of your code that this is a glm.

example(tweedie)
twdeReg <- glm(y~x, family=tweedie(var.power=1, link.power=1))
predict(twdeReg)
predict.glm(twdeReg)

In the predict family of functions you pass the argument newdata=newDataName to specify prediction on a new dataset, default behavior is to predict on the current data. Also, read ?predict to see the 3 options of if you want prediction of the linear combination of predictors, on the y-space, or the other one which I've never found super useful.

Added from comment on the reply:

To get this manually you'll need to use the equation from ?tweedie documentation that describes the link. The doc states: $\mu_i^q = \mathbb{E}(y_i|\vec{x}_i)^q = \vec{x}_i^T\vec{\beta}$ so if you want the expected value you'll need to calculate:

$$\mathbb{E}(y_i|\vec{x}_i) = (\vec{x}_i^T\vec{\beta})^{1/q},$$

where $q$ is the link.power=1 value. so if q=1 as the question is written simply take the product of the estimates times the coefficients and add up all of these products ( $\vec{x}_i^T\vec{\hat{\beta}}$ ) where the 'hat' denotes the estimate.

Best Answer

Related Solutions

Solved – Tweedie p parameter Interpretation

Solved – How to calculate the Tweedie prediction based on model coefficients

Related Question