Logit GLM and logit beta regression: Practical difference in the interpretation of the coefficients

beta-regressiongeneralized linear modelinterpretationlogistic

Terminology: By logit GLM I mean a generalized linear model with a binomial distribution and a logit link function. By beta regression I mean beta regression with a logit link function.

I understand – or at least I think/hope I do – that while logit GLM and beta regression can be applied to the same examples, they have different theoretical underpinnings and thus can lead to different results computationally because they use different maximum-likelihood estimations. My question is about the how we interpret the coefficients.

In this answer it is pointed out that beta regression models $\mathrm{logit}(E(y))$ while logit GLM models $E(\mathrm{logit}(y))$. We can apply the logistic function to invert $\mathrm{logit}(E(y))$, but we cannot do this to $E(\mathrm{logit}(y))$. Hence interpretation of the coefficients is straightforward in beta regression but not in logit GLM. This tallies with what is says here (page 1):

This approach [logit GLM], nonetheless, has shortcomings. First, the regression
parameters are interpretable in terms of the mean of $\tilde{y}$, and not in terms of the mean of $y$ (given Jensen’s inequality).

But what is the intepretation of the coefficients in beta regression? This is made clear on page 6 of the original paper on beta regression: If the value of the $i^\mathrm{th}$ regressor is increased by $c$ units and all other
independent variables remain unchanged, then $e^{c\beta_i}$ is the odds ratio. So far, so good. However, the same interpretation of the coefficients in logit GLM is given here and here. Indeed, in this answer it states:

Thus you should realize that we [in the context of beta regression] are basically using the same results and interpretations from standard generalized linear modeling (under the logit link).

Is this interpretation of the coefficients in logit GLM a (widespread) misunderstanding? Or is it such that any theoretical qualms are outweighed by the practical utility of this interpretation of the coefficients?

Best Answer

You state that by "logit GLM [you] mean a generalized linear model with a binomial distribution and a logit link function". (It would be common to refer to that as just 'logistic regression'.) It's important to note that that differs from a General Linear Model (GLM) where the $Y$ variable has been transformed (typically / primarily) to achieve conditional normality and homoscedasticity.

The quote from @ŁukaszDeryło's answer to Logit transformation or beta regression for proportion data

(So in beta regression we have ${\rm logit}(E(y))$ modeled while in linear regression with logit-transformed dependent variable we have $E({\rm logit}(y))$. These two are not the same.)

and the quote from the Rbeta vignette, Beta Regression in R,

(How should one perform a regression analysis in which the dependent variable (or response variable), $y$, assumes values in the standard unit interval $(0, 1)$? The usual practice used to be to transform the data so that the transformed response, say $\tilde{y}$, assumes values in the real line and then apply a standard linear regression analysis. A commonly used transformation is the logit, $\tilde{y} = \log(y/(1 − y)$). This approach, nonetheless, has shortcomings. First, the regression parameters are interpretable in terms of the mean of $\tilde{y}$ and not in terms of the mean of $y$ (given Jensen’s inequality).)

both pertain to linear models (with logit transformed $Y$-variables), not to logistic regression. I believe that may be the source of confusion. (It is annoying that two different things are both called 'GLM'.)


For what it's worth, the standard interpretation applies to both beta regression with a logit link and logistic regression (which has a logit link by definition).