Solved – Significance in beta regression and glm binomial

beta-regressionbinomial distributiongeneralized linear modelrstatistical significance

While performing betaregression using betareg R package I noticed that the terms in my model are often significant, even with very small sample sizes. I tried the same model using glm with binomial family and logit link function, and I get very similar effect sizes but non-significant terms.

Can someone explain me how should I interpret this? Do the two models test significance in different ways?

NOTE: In my case the response variable is a proportion, so, although extremely unlikely, it could even take values 0 and 1.

library(betareg)

Y=c(0.5283019, 0.4845361, 0.4974874, 0.6884735, 0.5967742, 0.6835443, 0.4152047, 0.4949495,
  0.6478873, 0.7695853, 0.4764398, 0.5780591, 0.5689655)
X=c(0.3616452, -0.4931525,  0.7890441,  0.7890441, -0.9205514,  0.7890441, -0.9205514,
 -0.9205514,  1.2164429,  1.2164429, -1.3479503, -1.3479503,  0.7890441)

summary(glm(Y~X, family=binomial('logit')))
summary(betareg(Y~X))

Best Answer

The binomial is for modeling Bernoulli variables (i.e., binary) or binomial variables (i.e., the number of successes from a certain number of independent trials). So this should not be applied to computed rates (successes divided by trials) directly but glm() wants you to supply a matrix with successes and failures. Consequently, your glm() call above yields the warning:

Warning message:
In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

The beta regression model, on the other hand, is intended for situations where you only have a direct rate that does not correspond to success rates from a known number of independent trials. It uses a different likelihood and hence can lead to different results. Specifically, it has an additional precision parameter which is related to the variance of the observations.

Thus, if your proportions above come from a known number of independent trials, then supply this information and use a binomial GLM. Otherwise you can consider beta regression.

Additional remark: As your Y above supplies proportions directly, the binomial likelihood does not fit. Specifically, the variance of the observations will be overestimated. If you use a quasi-binomial with an additional dispersion parameter, the model still won't be really appropriate but much closer to the beta regression results.

R> summary(betareg(Y ~ X))

Call:
betareg(formula = Y ~ X)

Standardized weighted residuals 2:
    Min      1Q  Median      3Q     Max 
-1.7480 -0.8042 -0.1105  0.8864  1.8896 

Coefficients (mean model with logit link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.29444    0.08715   3.378 0.000729 ***
X            0.27270    0.09068   3.007 0.002637 ** 

Phi coefficients (precision model with identity link):
      Estimate Std. Error z value Pr(>|z|)   
(phi)    41.06      15.92   2.579   0.0099 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Type of estimator: ML (maximum likelihood)
Log-likelihood: 15.15 on 3 Df
Pseudo R-squared: 0.4149
Number of iterations: 34 (BFGS) + 2 (Fisher scoring) 

R> summary(glm(Y ~ X, family = quasibinomial))

Call:
glm(formula = Y ~ X, family = quasibinomial)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-0.25696  -0.11263  -0.01107   0.13491   0.25792  

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  0.29284    0.09523   3.075   0.0106 *
X            0.27078    0.09910   2.732   0.0195 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for quasibinomial family taken to be 0.02836306)

    Null deviance: 0.52867  on 12  degrees of freedom
Residual deviance: 0.31489  on 11  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 3