Beta Regression vs Quasi GLM – What is the Difference with Variance = ?(1-?)?

beta-regressionbinomial distributiongeneralized linear modellme4-nlmequasi-likelihood

First let me give some background; I will summarize my questions at the end.

The Beta distribution, parameterized by its mean $\mu$ and $\phi$, has $\operatorname{Var}(Y) = \operatorname{V}(\mu)/(\phi+1)$, where $\operatorname{V}(\mu) = \mu(1-\mu)$ is the variance function.

In a beta regression (e.g., using the betareg package in R), the regression assumes beta-distributed errors and estimates the fixed effects and the value of $\phi$.

In glm regression, it is possible to define a "quasi" distribution with a variance function of $\mu(1-\mu)$. So here the model assumes errors with the same variance function as Beta. The regression then estimates the fixed effects and the "dispersion" of the quasi distribution.

I may be missing something important, but it would seem that these two methods are essentially identical, perhaps differing only in their estimation method.

I tried both methods in R, regressing on a DV called "Similarity", which is in the interval $(0,1)$:

Call:
betareg(formula = Similarity ~ N + NK + Step_ent, data = TapData, link = "logit")

Coefficients (mean model with logit link):
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.715175   0.067805  10.547   <2e-16 ***
N           -0.063806   0.003858 -16.537   <2e-16 ***
NK          -0.362716   0.015008 -24.168   <2e-16 ***
Step_ent    -0.696895   0.070233  -9.923   <2e-16 ***

Phi coefficients (precision model with identity link):
      Estimate Std. Error z value Pr(>|z|)    
(phi)  10.6201     0.2084   50.96   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Type of estimator: ML (maximum likelihood)
Log-likelihood:  3817 on 5 Df
Pseudo R-squared: 0.2633
Number of iterations: 18 (BFGS) + 1 (Fisher scoring) 


Call:
glm(formula = Similarity ~ N + NK + Step_ent, family = quasi(link = "logit", 
variance = "mu(1-mu)"), data = TapData)

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.777451   0.069809  11.137   <2e-16 ***
N           -0.069348   0.003983 -17.411   <2e-16 ***
NK          -0.364702   0.016232 -22.468   <2e-16 ***
Step_ent    -0.704680   0.072491  -9.721   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasi family taken to be 0.0838547)

    Null deviance: 566.25  on 4974  degrees of freedom
Residual deviance: 422.76  on 4971  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 4

The coefficients of the two models are similar, as are their standard errors. The $\phi$ parameter is also similar: I assume that the Dispersion parameter (as reported by glm) and $\phi$ have the following relationship $\phi = 1/\text{Dispersion} – 1$, in which case they are 10.6201 and 10.9254, respectively.

However, none of these values is identical.

Is this because the only thing that actually differs in the two methods is their estimation procedure? Or is there some more fundamental difference I am missing? Also, is there any reason to prefer one method over the other?

Best Answer

You're correct that the mean and variance functions are of the same form.

This suggests that in very large samples, as long as you don't have observations really close to 1 or 0 they should tend to give quite similar answers because in that situation observations will have similar relative weights.

But in smaller samples where some of the continuous proportions approach the bounds, the differences can grow larger because the relative weights given by the two approaches will differ; if the points that get different weights are also relatively influential (more extreme in x-space), the differences may in some cases become substantial.

In beta-regression you'd be estimating via ML, and in the case of a quasibinomial model - at least one estimated in R, note this comment in the help:

The quasibinomial and quasipoisson families differ from the binomial and poisson families only in that the dispersion parameter is not fixed at one, so they can model over-dispersion. For the binomial case see McCullagh and Nelder (1989, pp. 124–8). Although they show that there is (under some restrictions) a model with variance proportional to mean as in the quasi-binomial model, note that glm does not compute maximum-likelihood estimates in that model. The behaviour of S is closer to the quasi- variants.

I think in betareg you can get $h_{ii}$ values, and you can as well for GLMs, so at the two fitted models you can compare an approximation of each observation's relative influence (/"weight") on its own fitted value (since the other components of the ratio of influences should cancel, or nearly so). This should give a quick sense of which observations are looked at most differently by the two approaches. [One might do it more exactly by actually tweaking the observations one by one and seeing the change in fit per unit change in value]

Note that the betareg vignette gives some discussion of the connection between these models at the end of section 2.