While performing betaregression using betareg R package I noticed that the terms in my model are often significant, even with very small sample sizes. I tried the same model using glm with binomial family and logit link function, and I get very similar effect sizes but non-significant terms.
Can someone explain me how should I interpret this? Do the two models test significance in different ways?
NOTE: In my case the response variable is a proportion, so, although extremely unlikely, it could even take values 0 and 1.
library(betareg)
Y=c(0.5283019, 0.4845361, 0.4974874, 0.6884735, 0.5967742, 0.6835443, 0.4152047, 0.4949495,
0.6478873, 0.7695853, 0.4764398, 0.5780591, 0.5689655)
X=c(0.3616452, -0.4931525, 0.7890441, 0.7890441, -0.9205514, 0.7890441, -0.9205514,
-0.9205514, 1.2164429, 1.2164429, -1.3479503, -1.3479503, 0.7890441)
summary(glm(Y~X, family=binomial('logit')))
summary(betareg(Y~X))
Best Answer
The binomial is for modeling Bernoulli variables (i.e., binary) or binomial variables (i.e., the number of successes from a certain number of independent trials). So this should not be applied to computed rates (successes divided by trials) directly but
glm()
wants you to supply a matrix with successes and failures. Consequently, yourglm()
call above yields the warning:The beta regression model, on the other hand, is intended for situations where you only have a direct rate that does not correspond to success rates from a known number of independent trials. It uses a different likelihood and hence can lead to different results. Specifically, it has an additional precision parameter which is related to the variance of the observations.
Thus, if your proportions above come from a known number of independent trials, then supply this information and use a binomial GLM. Otherwise you can consider beta regression.
Additional remark: As your
Y
above supplies proportions directly, the binomial likelihood does not fit. Specifically, the variance of the observations will be overestimated. If you use a quasi-binomial with an additional dispersion parameter, the model still won't be really appropriate but much closer to the beta regression results.