First let me give some background; I will summarize my questions at the end.
The Beta distribution, parameterized by its mean $\mu$ and $\phi$, has $\operatorname{Var}(Y) = \operatorname{V}(\mu)/(\phi+1)$, where $\operatorname{V}(\mu) = \mu(1-\mu)$ is the variance function.
In a beta regression (e.g., using the betareg package in R), the regression assumes beta-distributed errors and estimates the fixed effects and the value of $\phi$.
In glm regression, it is possible to define a "quasi" distribution with a variance function of $\mu(1-\mu)$. So here the model assumes errors with the same variance function as Beta. The regression then estimates the fixed effects and the "dispersion" of the quasi distribution.
I may be missing something important, but it would seem that these two methods are essentially identical, perhaps differing only in their estimation method.
I tried both methods in R, regressing on a DV called "Similarity", which is in the interval $(0,1)$:
Call:
betareg(formula = Similarity ~ N + NK + Step_ent, data = TapData, link = "logit")
Coefficients (mean model with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.715175 0.067805 10.547 <2e-16 ***
N -0.063806 0.003858 -16.537 <2e-16 ***
NK -0.362716 0.015008 -24.168 <2e-16 ***
Step_ent -0.696895 0.070233 -9.923 <2e-16 ***
Phi coefficients (precision model with identity link):
Estimate Std. Error z value Pr(>|z|)
(phi) 10.6201 0.2084 50.96 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Type of estimator: ML (maximum likelihood)
Log-likelihood: 3817 on 5 Df
Pseudo R-squared: 0.2633
Number of iterations: 18 (BFGS) + 1 (Fisher scoring)
Call:
glm(formula = Similarity ~ N + NK + Step_ent, family = quasi(link = "logit",
variance = "mu(1-mu)"), data = TapData)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.777451 0.069809 11.137 <2e-16 ***
N -0.069348 0.003983 -17.411 <2e-16 ***
NK -0.364702 0.016232 -22.468 <2e-16 ***
Step_ent -0.704680 0.072491 -9.721 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasi family taken to be 0.0838547)
Null deviance: 566.25 on 4974 degrees of freedom
Residual deviance: 422.76 on 4971 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 4
The coefficients of the two models are similar, as are their standard errors. The $\phi$ parameter is also similar: I assume that the Dispersion parameter (as reported by glm) and $\phi$ have the following relationship $\phi = 1/\text{Dispersion} – 1$, in which case they are 10.6201 and 10.9254, respectively.
However, none of these values is identical.
Is this because the only thing that actually differs in the two methods is their estimation procedure? Or is there some more fundamental difference I am missing? Also, is there any reason to prefer one method over the other?
Best Answer
You're correct that the mean and variance functions are of the same form.
This suggests that in very large samples, as long as you don't have observations really close to 1 or 0 they should tend to give quite similar answers because in that situation observations will have similar relative weights.
But in smaller samples where some of the continuous proportions approach the bounds, the differences can grow larger because the relative weights given by the two approaches will differ; if the points that get different weights are also relatively influential (more extreme in x-space), the differences may in some cases become substantial.
In beta-regression you'd be estimating via ML, and in the case of a quasibinomial model - at least one estimated in R, note this comment in the help:
I think in betareg you can get $h_{ii}$ values, and you can as well for GLMs, so at the two fitted models you can compare an approximation of each observation's relative influence (/"weight") on its own fitted value (since the other components of the ratio of influences should cancel, or nearly so). This should give a quick sense of which observations are looked at most differently by the two approaches. [One might do it more exactly by actually tweaking the observations one by one and seeing the change in fit per unit change in value]
Note that the betareg vignette gives some discussion of the connection between these models at the end of section 2.