Solved – Interpretation of coefficients from glm Gamma

gamma distributiongeneralized linear modelrregression

I am attempting to fit a model to a dataset with frequency (Hz) is the dependent variable. Using a generalized linear model based on a gamma distribution seems appropriate since the values of the dependent variable are $0 \rightarrow \infty$ and I have confirmed that the observed values align with a gamma distribution using a qqplot.

I am attempting to fit the model in R using glm, however it is unclear to me what the estimate values returned by summary.glm refer to.

The example from the documentation is provided for context

clotting <- data.frame(
u = c(5,10,15,20,30,40,60,80,100),
lot1 = c(118,58,42,35,27,25,21,19,18))

glm.clotting <- glm(
 lot1 ~ log(u),
 data = clotting,
 family = Gamma
 )

summary(glm.clotting) 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.0165544  0.0009275  -17.85 4.28e-07 ***
log(u)       0.0153431  0.0004150   36.98 2.75e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

I assume these refer to one of the shape parameters of the fitted gamma distribution, but I have not been able to find a clear explanation. Do these values refer to the gamma distribution average $\mu = k\theta = \frac{\alpha}{\beta}$? the rate parameter $\beta$ ? or the scale parameter $\theta$?

Also, how should these estimates be interpreted differently in the context of a continuous independent variable (as in the example) verse a discrete independent variable?

Best Answer

It will help other readers to point out that this data is an example from the stats package documentation page help("glm"), and you can run the example at the R prompt by:

example(glm)

The data you show was originally published by Hurn et al (1945) and the glm analysis was originally proposed in Section 8.4.2 of McCullagh and Nelder (1989). The variable u is percent concentration of normal blood plasma and lot1 is blood clotting time in seconds (not a frequency measure as you seem to think). Generally speaking, the higher the concentration of blood plasma, the faster the clotting.

The coefficients of a glm always relate to the mean $\mu$, by way of the assumed link function. The default link for a gamma glm is the inverse link, so the model that has been fitted is $$\frac{1}{\mu_i}=\beta_1+x_i\beta_2$$ where $\mu_i$ represents the mean of the $i$th observation and $x_i=\log(u_i)$. The results from the fitted model show that $\hat\beta_1=-0.017$ and $\hat\beta_2=0.015$.

By the way, you have omitted some of the output from summary(glm.clotting). The next line of output says

(Dispersion parameter for Gamma family taken to be 0.002446059)

which tells you that $1/\hat\alpha=0.0024$, because the dispersion of a gamma glm is the reciprocal of $\alpha$.

So that explains this classic glm example dataset. If you want to learn about glms with categorical factors as predictors, then I suggest you post a new question in the context of a dataset that has categorical predictors. There is little point in trying to learn about factors and how they enter into linear models from the current dataset. I will point out that, when you type example(glm), the very first example (from Dobson 1990) involves two factors in a Poisson glm.

References

Hurn, M., Barker, N.W. and Magath, T.B., 1945. The determination of prothrombin time following the administration of dicumarol, 3, 3'-methylenebis (4-hydroxyconmarin)," with special reference to thromboplastin. Journal of Laboratory and Clinical Medicine, 30, pp.432-447.

McCullagh, P, and Nelder, JA (1989). Generalized linear models (2nd ed.). Chapman and Hall, London.