I have a small number of observation point, and the data is continuous and very skewed. I decided to analyze the data with Generalized Linear Model, gamma family, log-link. I'm having hard time interpreting the result. I want to make sure whether I'm doing right.
Here is the result summary:
glm(formula = FD ~ DRS + Fine * CloudX + TempX + HumX + WindX,
family = Gamma(link = "log"), data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.2968 -0.5690 -0.2043 0.1785 1.7283
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.1099192 0.9733979 3.195 0.00176
DRS -4.4536685 2.1644974 -2.058 0.04163
Fine 0.0035130 0.0008545 4.111 6.93e-05
CloudXScattered 0.5738074 0.3933730 1.459 0.14706
TempX 0.0404227 0.0266489 1.517 0.13173
HumX -0.0027362 0.0097564 -0.280 0.77958
WindX 0.0072082 0.0338432 0.213 0.83167
Fine:CloudXScattered -0.0271858 0.0128572 -2.114 0.03639
(Intercept) **
DRS *
Fine ***
CloudXScattered
TempX
HumX
WindX
Fine:CloudXScattered *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Gamma family taken to be 0.5475776)
Null deviance: 72.231 on 137 degrees of freedom
Residual deviance: 59.658 on 130 degrees of freedom
AIC: 1300.1
Number of Fisher Scoring iterations: 6
The dependent variable is FD of which unit is 'minute'.
Q1. The effect of DRS on FD is exp(-4.45)=0.012. Since it is multiplicative, does 1 unit increase in DRS cause *0.012 increase in FD? DRS is unit-less and decimal. Then what does '1 unit increase' mean here? 1, 0.1, or 0.001?
Q2. The effect of the interaction term FineCloudScattered is exp(-0.027)=0.97. The variable 'Cloud' is categorical with two values; Overcast and Scattered. So the effect of FineCloudScattered on the dependent variable is 0.97 less than FineCloudOvercast. Am I right?
Q3. I'd like to test the goodness of fit of the model.
> pchisq(d, df = gamma$df.residual, lower.tail = F)
[1] 0.9999999
I'm not sure if the value can be as high as 0.99 but, can I assume my model fits very well?
Q4. Though I think GLM, gamma, log-link is the best choice for my dataset, but what is the difference between log-link and identity-link? Log-link is multiplicative and identity link is additive?
Please forgive the noob question..!
Best Answer
First, to be clear about the model that's been fit, you're modeling FD as following a Gamma distribution with the mean of the distribution defined as
$$\log(\mu) = \beta_{0} + \beta_{1}x_{1} + \ldots + \beta_{p}x_{p}$$
Leading to
$$ \mu = \exp(\beta_{0} + \beta_{1}x_{1} + \ldots + \beta_{p}x_{p})$$
$$ = \exp(\beta_{0}) \exp(\beta_{1}x_{1}) \ldots \exp(\beta_{p}x_{p})$$
Yes, a 1 unit increase in DRS leads to a *0.012 decrease in the mean of FD. Put another way, increasing DRS by 1 causes a 98.8% reduction in FD. The scale used here is exactly how it was entered in the data, so it may not be meaningful to talk about a 1 unit increase in DRS. But you can easily plug in what size would be meaningful and see what the effect will be. Increasing DRS by 0.01 leads to an change of $*\exp(-4.45*0.01) = *0.956$. (We can confirm that changing DRS by 0.01 one hundred times would give $0.956^{100} = 0.012$, as we expect)
I can't quite tell what you mean. When Cloud is scattered the effect of Fine is $\exp((0.0035 - 0.0272)*Fine) = \exp(-0.0237*Fine)$. When Cloud is Overcast the effect of Fine is $\exp(0.0035*Fine)$.
The model is overwhelmingly better than the "null model". If we assume that the data follow a gamma distribution with the same mean for each observation, the probability that you randomly get data that support your model this well are less than 0.000001. You can confidently say that your model explains something in the data.
The identity link would model the mean of the Gamma as $\mu = \beta_{0} + \beta_{1}x_{1} + \ldots + \beta_{p}x_{p}$. That would give an additive model with the mean being linearly affected by the data.