Logit GLM and logit beta regression: Practical difference in the interpretation of the coefficients

beta-regressiongeneralized linear modelinterpretationlogistic

Terminology: By logit GLM I mean a generalized linear model with a binomial distribution and a logit link function. By beta regression I mean beta regression with a logit link function.

I understand – or at least I think/hope I do – that while logit GLM and beta regression can be applied to the same examples, they have different theoretical underpinnings and thus can lead to different results computationally because they use different maximum-likelihood estimations. My question is about the how we interpret the coefficients.

In this answer it is pointed out that beta regression models $\mathrm{logit}(E(y))$ while logit GLM models $E(\mathrm{logit}(y))$. We can apply the logistic function to invert $\mathrm{logit}(E(y))$, but we cannot do this to $E(\mathrm{logit}(y))$. Hence interpretation of the coefficients is straightforward in beta regression but not in logit GLM. This tallies with what is says here (page 1):

This approach [logit GLM], nonetheless, has shortcomings. First, the regression
parameters are interpretable in terms of the mean of $\tilde{y}$, and not in terms of the mean of $y$ (given Jensen’s inequality).

But what is the intepretation of the coefficients in beta regression? This is made clear on page 6 of the original paper on beta regression: If the value of the $i^\mathrm{th}$ regressor is increased by $c$ units and all other
independent variables remain unchanged, then $e^{c\beta_i}$ is the odds ratio. So far, so good. However, the same interpretation of the coefficients in logit GLM is given here and here. Indeed, in this answer it states:

Thus you should realize that we [in the context of beta regression] are basically using the same results and interpretations from standard generalized linear modeling (under the logit link).

Is this interpretation of the coefficients in logit GLM a (widespread) misunderstanding? Or is it such that any theoretical qualms are outweighed by the practical utility of this interpretation of the coefficients?

Best Answer

You state that by "logit GLM [you] mean a generalized linear model with a binomial distribution and a logit link function". (It would be common to refer to that as just 'logistic regression'.) It's important to note that that differs from a General Linear Model (GLM) where the $Y$ variable has been transformed (typically / primarily) to achieve conditional normality and homoscedasticity.

The quote from @ŁukaszDeryło's answer to Logit transformation or beta regression for proportion data

(So in beta regression we have ${\rm logit}(E(y))$ modeled while in linear regression with logit-transformed dependent variable we have $E({\rm logit}(y))$. These two are not the same.)

and the quote from the Rbeta vignette, Beta Regression in R,

(How should one perform a regression analysis in which the dependent variable (or response variable), $y$, assumes values in the standard unit interval $(0, 1)$? The usual practice used to be to transform the data so that the transformed response, say $\tilde{y}$, assumes values in the real line and then apply a standard linear regression analysis. A commonly used transformation is the logit, $\tilde{y} = \log(y/(1 − y)$). This approach, nonetheless, has shortcomings. First, the regression parameters are interpretable in terms of the mean of $\tilde{y}$ and not in terms of the mean of $y$ (given Jensen’s inequality).)

both pertain to linear models (with logit transformed $Y$-variables), not to logistic regression. I believe that may be the source of confusion. (It is annoying that two different things are both called 'GLM'.)

For what it's worth, the standard interpretation applies to both beta regression with a logit link and logistic regression (which has a logit link by definition).

Related Solutions

Solved – standardized coefficients from glm logit

If the predictors you used in the different regression models are measured in the same way it is no longer necessary to normalize the coefficients, since the predictors are already on the same scale the coefficients are as well.

If not, first try to rescale the predictor variables.

Solved – Why use the logit link in beta regression

Justification of the link function: A link function $g(\mu): (0,1) \rightarrow \mathbb{R}$ assures that all fitted values $\hat \mu = g^{-1}(x^\top \hat \beta)$ are always in $(0, 1)$. This may not matter that much in some applications, e.g., because the predictions or only evaluated in-sample or are not too close to 0 or 1. But it may matter in some applications and you typically do not know in advance whether it matters or not. Typical problems I have seen include: evaluating predictions new $x$ values that are (slightly) outside the range of the original learning sample or finding suitable starting values. For the latter consider:

library("betareg")
data("GasolineYield", package = "betareg")
betareg(yield ~ batch + temp, data = GasolineYield, link = make.link("identity"))
## Error in optim(par = start, fn = loglikfun, gr = if (temporary_control$use_gradient) gradfun else NULL,  : 
##   initial value in 'vmmin' is not finite

But, of course, one can simply try both options and see whether problems with the identity link occur and/or whether it improves the fit of the model.

Interpretation of the parameters: I agree that interpreting parameters in models with link functions is more difficult than in models with an identity link and practitioners often get it wrong. However, I have also often seen misinterpretations of the parameters in linear probability models (binary regressions with identity link, typically by least squares). The assumption that marginal effects are constant cannot hold if predictions get close enough to 0 or 1 and one would need to be really careful. E.g., for an observation with $\hat \mu = 0.01$ an increase in $x$ cannot lead to a decrease of $\hat \mu$ of, say, $0.02$. But this is often treated very sloppily in those scenarios. Hence, I would argue that for a limited response model the parameters from any link function need to be interpreted carefully and might need some practice. My usual advice is therefore (as shown in the other discussion you linked in your question) to look at the effects for regressor configurations of interest. These are easier to interpret and often (but not always) rather similar (from a practical perspective) for different link functions.

Best Answer

Related Solutions

Solved – standardized coefficients from glm logit

Solved – Why use the logit link in beta regression

Related Question