I have used the betareg package in R to fit a regression. My question is: how do I calculate confidence intervals for betaregression in R?
Solved – Confidence intervals for beta regression
beta-regression
Related Solutions
The binomial is for modeling Bernoulli variables (i.e., binary) or binomial variables (i.e., the number of successes from a certain number of independent trials). So this should not be applied to computed rates (successes divided by trials) directly but glm()
wants you to supply a matrix with successes and failures. Consequently, your glm()
call above yields the warning:
Warning message:
In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
The beta regression model, on the other hand, is intended for situations where you only have a direct rate that does not correspond to success rates from a known number of independent trials. It uses a different likelihood and hence can lead to different results. Specifically, it has an additional precision parameter which is related to the variance of the observations.
Thus, if your proportions above come from a known number of independent trials, then supply this information and use a binomial GLM. Otherwise you can consider beta regression.
Additional remark: As your Y
above supplies proportions directly, the binomial likelihood does not fit. Specifically, the variance of the observations will be overestimated. If you use a quasi-binomial with an additional dispersion parameter, the model still won't be really appropriate but much closer to the beta regression results.
R> summary(betareg(Y ~ X))
Call:
betareg(formula = Y ~ X)
Standardized weighted residuals 2:
Min 1Q Median 3Q Max
-1.7480 -0.8042 -0.1105 0.8864 1.8896
Coefficients (mean model with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.29444 0.08715 3.378 0.000729 ***
X 0.27270 0.09068 3.007 0.002637 **
Phi coefficients (precision model with identity link):
Estimate Std. Error z value Pr(>|z|)
(phi) 41.06 15.92 2.579 0.0099 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Type of estimator: ML (maximum likelihood)
Log-likelihood: 15.15 on 3 Df
Pseudo R-squared: 0.4149
Number of iterations: 34 (BFGS) + 2 (Fisher scoring)
R> summary(glm(Y ~ X, family = quasibinomial))
Call:
glm(formula = Y ~ X, family = quasibinomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.25696 -0.11263 -0.01107 0.13491 0.25792
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.29284 0.09523 3.075 0.0106 *
X 0.27078 0.09910 2.732 0.0195 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasibinomial family taken to be 0.02836306)
Null deviance: 0.52867 on 12 degrees of freedom
Residual deviance: 0.31489 on 11 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 3
The pseudo R-squared is the one suggested by Ferrari & Cribari-Neto for beta regressions: the squared correlation between between the linear predictor for the mean and the link-transformed response. See: calculating a pseudo R2 value when deviance is negative
And for the regressor matrix the usual diagnostics, e.g. based on condition numbers erc., can be applied.
Best Answer
The beta likelihood is not a regular exponential family, so constructing interval estimates for such two parameter families is not easily done. I think Zeileis was wise not to implement any de-facto methods for
confint
. The cited article Ospina suggests that bootstrap interval estimates perform best. The packageboot
has some methods, but bootstrapping is also easily done "by-hand" in R.