I am attempting to fit a model to a dataset with frequency (Hz) is the dependent variable. Using a generalized linear model based on a gamma distribution seems appropriate since the values of the dependent variable are $0 \rightarrow \infty$ and I have confirmed that the observed values align with a gamma distribution using a qqplot.
I am attempting to fit the model in R using glm
, however it is unclear to me what the estimate values returned by summary.glm
refer to.
The example from the documentation is provided for context
clotting <- data.frame(
u = c(5,10,15,20,30,40,60,80,100),
lot1 = c(118,58,42,35,27,25,21,19,18))
glm.clotting <- glm(
lot1 ~ log(u),
data = clotting,
family = Gamma
)
summary(glm.clotting)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0165544 0.0009275 -17.85 4.28e-07 ***
log(u) 0.0153431 0.0004150 36.98 2.75e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I assume these refer to one of the shape parameters of the fitted gamma distribution, but I have not been able to find a clear explanation. Do these values refer to the gamma distribution average $\mu = k\theta = \frac{\alpha}{\beta}$? the rate parameter $\beta$ ? or the scale parameter $\theta$?
Also, how should these estimates be interpreted differently in the context of a continuous independent variable (as in the example) verse a discrete independent variable?
Best Answer
It will help other readers to point out that this data is an example from the stats package documentation page
help("glm")
, and you can run the example at the R prompt by:The data you show was originally published by Hurn et al (1945) and the glm analysis was originally proposed in Section 8.4.2 of McCullagh and Nelder (1989). The variable
u
is percent concentration of normal blood plasma andlot1
is blood clotting time in seconds (not a frequency measure as you seem to think). Generally speaking, the higher the concentration of blood plasma, the faster the clotting.The coefficients of a glm always relate to the mean $\mu$, by way of the assumed link function. The default link for a gamma glm is the inverse link, so the model that has been fitted is $$\frac{1}{\mu_i}=\beta_1+x_i\beta_2$$ where $\mu_i$ represents the mean of the $i$th observation and $x_i=\log(u_i)$. The results from the fitted model show that $\hat\beta_1=-0.017$ and $\hat\beta_2=0.015$.
By the way, you have omitted some of the output from
summary(glm.clotting)
. The next line of output sayswhich tells you that $1/\hat\alpha=0.0024$, because the dispersion of a gamma glm is the reciprocal of $\alpha$.
So that explains this classic glm example dataset. If you want to learn about glms with categorical factors as predictors, then I suggest you post a new question in the context of a dataset that has categorical predictors. There is little point in trying to learn about factors and how they enter into linear models from the current dataset. I will point out that, when you type
example(glm)
, the very first example (from Dobson 1990) involves two factors in a Poisson glm.References
Hurn, M., Barker, N.W. and Magath, T.B., 1945. The determination of prothrombin time following the administration of dicumarol, 3, 3'-methylenebis (4-hydroxyconmarin)," with special reference to thromboplastin. Journal of Laboratory and Clinical Medicine, 30, pp.432-447.
McCullagh, P, and Nelder, JA (1989). Generalized linear models (2nd ed.). Chapman and Hall, London.