Solved – Interpreting results from Generalized Linear Model, gamma family, log-link

gamma distributiongeneralized linear model

I have a small number of observation point, and the data is continuous and very skewed. I decided to analyze the data with Generalized Linear Model, gamma family, log-link. I'm having hard time interpreting the result. I want to make sure whether I'm doing right.
Here is the result summary:

glm(formula = FD ~ DRS + Fine * CloudX + TempX + HumX + WindX, 
    family = Gamma(link = "log"), data = data)
Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.2968  -0.5690  -0.2043   0.1785   1.7283  

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)
(Intercept)           3.1099192  0.9733979   3.195  0.00176
DRS                  -4.4536685  2.1644974  -2.058  0.04163
Fine                  0.0035130  0.0008545   4.111 6.93e-05
CloudXScattered       0.5738074  0.3933730   1.459  0.14706
TempX                 0.0404227  0.0266489   1.517  0.13173
HumX                 -0.0027362  0.0097564  -0.280  0.77958
WindX                 0.0072082  0.0338432   0.213  0.83167
Fine:CloudXScattered -0.0271858  0.0128572  -2.114  0.03639
                        
(Intercept)          ** 
DRS                  *  
Fine                 ***
CloudXScattered         
TempX                   
HumX                    
WindX                   
Fine:CloudXScattered *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Gamma family taken to be 0.5475776)

    Null deviance: 72.231  on 137  degrees of freedom
Residual deviance: 59.658  on 130  degrees of freedom
AIC: 1300.1

Number of Fisher Scoring iterations: 6

The dependent variable is FD of which unit is 'minute'.

Q1. The effect of DRS on FD is exp(-4.45)=0.012. Since it is multiplicative, does 1 unit increase in DRS cause *0.012 increase in FD? DRS is unit-less and decimal. Then what does '1 unit increase' mean here? 1, 0.1, or 0.001?

Q2. The effect of the interaction term FineCloudScattered is exp(-0.027)=0.97. The variable 'Cloud' is categorical with two values; Overcast and Scattered. So the effect of FineCloudScattered on the dependent variable is 0.97 less than FineCloudOvercast. Am I right?

Q3. I'd like to test the goodness of fit of the model.

> pchisq(d, df = gamma$df.residual, lower.tail = F)
[1] 0.9999999

I'm not sure if the value can be as high as 0.99 but, can I assume my model fits very well?

Q4. Though I think GLM, gamma, log-link is the best choice for my dataset, but what is the difference between log-link and identity-link? Log-link is multiplicative and identity link is additive?

Please forgive the noob question..!

Best Answer

First, to be clear about the model that's been fit, you're modeling FD as following a Gamma distribution with the mean of the distribution defined as

$$\log(\mu) = \beta_{0} + \beta_{1}x_{1} + \ldots + \beta_{p}x_{p}$$

Leading to

$$ \mu = \exp(\beta_{0} + \beta_{1}x_{1} + \ldots + \beta_{p}x_{p})$$

$$ = \exp(\beta_{0}) \exp(\beta_{1}x_{1}) \ldots \exp(\beta_{p}x_{p})$$

  1. Yes, a 1 unit increase in DRS leads to a *0.012 decrease in the mean of FD. Put another way, increasing DRS by 1 causes a 98.8% reduction in FD. The scale used here is exactly how it was entered in the data, so it may not be meaningful to talk about a 1 unit increase in DRS. But you can easily plug in what size would be meaningful and see what the effect will be. Increasing DRS by 0.01 leads to an change of $*\exp(-4.45*0.01) = *0.956$. (We can confirm that changing DRS by 0.01 one hundred times would give $0.956^{100} = 0.012$, as we expect)

  2. I can't quite tell what you mean. When Cloud is scattered the effect of Fine is $\exp((0.0035 - 0.0272)*Fine) = \exp(-0.0237*Fine)$. When Cloud is Overcast the effect of Fine is $\exp(0.0035*Fine)$.

  3. The model is overwhelmingly better than the "null model". If we assume that the data follow a gamma distribution with the same mean for each observation, the probability that you randomly get data that support your model this well are less than 0.000001. You can confidently say that your model explains something in the data.

  4. The identity link would model the mean of the Gamma as $\mu = \beta_{0} + \beta_{1}x_{1} + \ldots + \beta_{p}x_{p}$. That would give an additive model with the mean being linearly affected by the data.