Solved – post-hoc test for betareg model R

beta-regressionpost-hocr

I'm currently analyzing a dataset about quality of fodder. The data for e.g. crude protein content are given in %, I don't have absolute values. Because of the data structure (I don't give you the details here), I opted for the beta-regression from R's betareg package.
I want to know the influence of various factors (and their interactions) on the crude protein content. Some factors have two levels (e.g. fertilization: with/without), others have three or more (e.g. site: site1, site2, site3 or the harvest time of the year where my samples come from (cut: spring, summer, autumn).
I have specified a model that works quite well on the data I have, residuals look well-behaved, predictors and model parameters are good, etc.
Now I would love to have some more information than the summary(mymodel) gives. I would like to test not only if other factor levels are different from my reference level but if they are significantly different from each other. In my example that would be: On which of my sites is the crude protein level in the fodder larger than on others and are those differences all significant?
I would think about doing a post-hoc test similar to e.g. the Tukey I could do if I had done a simple ANOVA. I googled, searched a lot online and asked people, but I can't find anything that works with betareg-objects.

So my question is: How do I do a post-hoc test for parameters of betareg-models in R? Or is this a bad idea? If so, why? Or is there no method yet? Please help me, thank you a lot!

I'm not a statistics/math pro, I learned almost everything I know by the modelling books by Alain Zuur and this site, so please be gentle.

My model (in the simplified version of my dataset) is specified like this:

library(betareg)
prot<-protein/100
mymodel<-betareg(prot~ sward * Fert + site + cut + repetition | site +cut, data=mydata, link="loglog")

str(mydata)
 'data.frame':   848 obs. of  58 variables:
  $ protein     num  16.4 16.5 13.7 13.5 15 ...
  $ cut          : Factor w/ 3 levels "autumn","spring",..: 1 1 2 2 3 3 1 2 2 3 ...
  $ loc          : Factor w/ 3 levels "G","O","S": 1 1 1 1 1 1 1 1 1 1 ...
  $ repetition   : Factor w/ 4 levels "I","II","III",..: 1 1 1 1 1 1 1 1 1 1 ...
  $ Fert         : Factor w/ 2 levels "fertilized","non-fertilized": 1 1 1 1 1 1 1 1 1 1 ...
  $ sward        : Factor w/ 2 levels "diverse","species-poor": 1 1 1 1 1 1 1 1 1 1 ...

a reproducible but very simplified example code would be (note that my actual model is much larger!):

site<-c("site1","site2","site3","site1","site2","site3","site1","site2","site3","site2")
prot<-c(0.1642038, 0.1650442, 0.1369376, 0.1350139, 0.1502178, 0.1515794, 0.1354457, 0.1301206, 0.1311298, 0.1308463)
fert<-c("with","without","with","without","with","without","with","without","with","without")
cut<-c("autumn","autumn","spring", "spring", "summer", "summer", "autumn", "spring", "spring", "summer")

mymodel<-betareg(prot~ fert + site + cut | site, link="loglog")

Best Answer

Post-hoc testing for beta regressions works in the same way that it does for other maximum likelihood (regression) models. In R there are various packages with object-oriented implementations of such procedures, e.g., lmtest, car, multcomp among others.

For testing individual hypotheses it is probably easiest to use linearHypothesis() from car. For example, for equality of the second and third of the site effects in the mean regression:

linearHypothesis(mymodel, "sitesite2 = sitesite3")
## Linear hypothesis test
## 
## Hypothesis:
## sitesite2 - sitesite3 = 0
## 
## Model 1: restricted model
## Model 2: prot ~ fert + site + cut | site
## 
##   Res.Df Df  Chisq Pr(>Chisq)
## 1      2                     
## 2      1  1 2.2491     0.1337

And for the dispersion (phi) submodel, the coefficients have to be prefixed with (phi)_ as shown in coef(mymodel):

linearHypothesis(mymodel, "(phi)_sitesite2 = (phi)_sitesite3")
## Linear hypothesis test
## 
## Hypothesis:
## phi)_sitesite2 - phi)_sitesite3 = 0
## 
## Model 1: restricted model
## Model 2: prot ~ fert + site + cut | site
## 
##   Res.Df Df  Chisq Pr(>Chisq)  
## 1      2                       
## 2      1  1 4.7682    0.02899 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

For testing multiple hypotheses (e.g., Dunnett or Tukey type contrasts) one can use glht() from multcomp in a similar fashion.

summary(glht(mymodel, linfct = c("sitesite2 = 0", "sitesite3 = 0")))
##          Simultaneous Tests for General Linear Hypotheses
## 
## Fit: betareg(formula = prot ~ fert + site + cut | site, link = "loglog")
## 
## Linear Hypotheses:
##                Estimate Std. Error z value Pr(>|z|)
## sitesite2 == 0  0.01488    0.03448   0.431    0.806
## sitesite3 == 0  0.04103    0.03494   1.174    0.320
## (Adjusted p values reported -- single-step method)
summary(glht(mymodel, linfct = c("`(phi)_sitesite2` = 0", "`(phi)_sitesite3` = 0")))
##          Simultaneous Tests for General Linear Hypotheses
## 
## Fit: betareg(formula = prot ~ fert + site + cut | site, link = "loglog")
## 
## Linear Hypotheses:
##                      Estimate Std. Error z value Pr(>|z|)   
## (phi)_sitesite2 == 0    1.084      1.080   1.004  0.49442   
## (phi)_sitesite3 == 0    3.443      1.155   2.982  0.00552 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

Note that for glht() the phi coefficient names have to be quoted. Also the mcp() interface for constructing sets of hypotheses does not work for betareg because there are two submodels (mean and phi) and not just one.

Related Question