Solved – Non normal residuals for Tweedie GLM

generalized linear modelnonparametricresidualstweedie-distribution

I am using Tweedie GLM as my data contains exact zeroes. However, my stats is weak and want to confirm a few things.

  1. Does Tweedie GLM assume normality of residuals?
  2. Is shapiro.test() the way for finding normality of residuals for a model with Tweedie GLM?
  3. If the data was not normally distributed nor were the residuals can I use "glht" function for post hoc analyses?

Here is the histogram of the response variable. I tried transforming the response variable but was not able to do so. Hence I have used the values as they were obtained from the dataset.enter image description here

here is the code:

require(statmod)

require(tweedie)

c0 <- tweedie.profile(y~x, data = c, p.vec = seq(1.0, 2.0, 0.01), method = "series")

c0$p.max

c1 <- glm(y~x, data = c, family = tweedie(var.power = 1.11, link.power = 0))

summary(c1)

shapiro.test(residuals(c1))



    Shapiro-Wilk normality test

data:  residuals(c1)
W = 0.81176, p-value < 2.2e-16

Residuals are not normal.

  1. Is the code correct?
  2. Is Tweedie GLM one of the options for a dataset as mine?

Any suggestions welcome. Thanks.

Best Answer

  1. No, a Tweedie GLM assumes that the responses follow a Tweedie distribution so, obviously, neither the data nor the ordinary residuals are expected to follow a normal distribution.
  2. No, a Shapiro test is not at all appropriate. The only practical way to examine residuals from a GLM such as this is to plot the quantile residuals. Unlike other types of residuals, the quantile residuals are normally distributed, even when y follows a mixed discrete-continuous distribution as in this case. For example, make a probability plot of the residuals:

    res <- qresiduals(c1)
    qnorm(res)
    

    The plot of residuals vs the covariate would also useful:

    plot(x, res)
    

    Note that these plots are examining whether your fitted model is appropriate as much as they are examining the distribution of y. If the second plot shows a pattern, then that would suggest you might need more or different predictors on your model.

  3. glht claims to work for any GLM, so presumably it will run on a Tweedie GLM. But there seems no reason why you need the glht function. It is easy to test the significance of your model using standard GLM functions in R:

    summary(c1)
    anova(c1, test="F")
    

    Why make the analysis more complicated than necessary?

  4. You code looks ok in principle, but obviously we can't vouch for whether your analysis is completely correct from the limited information you've given.

  5. Yes, definitely. From the limited information you've given, this seems the sort of data that Tweedie GLMs are intended for. I might change my mind if you explained the physical meaning of your data, for example what your response variable actually is and what leads to exact zeros but, from what you've said so far, the Tweedie model seems appropriate.

By the way, I assume that you have set var.power=1.11 because that was the estimate from c0$p.max.