Solved – Why use the logit link in beta regression

beta-regressionlogit

Recently, I have been interested in implementing a beta regression model, for an outcome that is a proportion. Note that this outcome would not fit into a binomial context, because there is no meaningful concept of a discrete "success" in this context. In fact, the outcome is actually a proportion of durations; the numerator being the number of seconds while a certain condition is active over the total number of seconds during which the condition was eligible to be active. I apologize for the vagaries, but I don't want to focus too much on this precise context, because I realize there are a variety of ways such a process could be modeled besides beta regression, and for now I am more interested specifically in theoretical questions that have arisen in my attempts to implement such a model (though I am, of course, open to any suggestions pointing me towards interesting alternative modeling strategies if you believe a beta regression is inappropriate entirely).

In any case, all of the resources I have been able to find have indicated that beta regression is typically fit using a logit (or probit/cloglog) link, and the parameters interpreted as changes in log-odds. However, I have yet to find a reference that actually provides any real justification for why one would want to use this link.

The original Ferrari & Cribari-Neto (2004) paper doesn't provide a justification; they note only that the logit function is "particularly useful", due to the odds ratio interpretation of the exponentiated parameters. Other sources allude to a desire to map from the interval (0,1) to the real line. However, do we necessarily need a link function for such a mapping, given that we are already assuming a beta distribution? What benefits does the link function provide above and beyond the constraints imposed by assuming the beta distribution to begin with? I've run a couple of quick simulations and haven't seen predictions outside the (0,1) interval with an identity link, even when simulating from beta distributions whose probability mass is largely bunched close to 0 or 1, but perhaps my simulations haven't been general enough to catch some of the pathologies.

It seems to me based on how individuals, in practice, interpret the parameter estimates from beta regression models (i.e. as odds ratios) that they are implicitly making inference with respect to the odds of a "success"; that is, they are using beta regression as a substitute for a binomial model. Perhaps this is appropriate in some contexts, given the relationship between beta and binomial distributions, but it seems to me that this should be more of a special case than the general one. In this question, an answer is provided for interpreting the odds ratio with respect to the continuous proportion rather than the outcome, but it seems to me to be unnecessarily cumbersome to try and interpret things this way, as opposed to using, say, a log or identity link and interpreting % changes or unit-shifts.

So, why do we use the logit link for beta regression models? Is it simply as a matter of convenience, to relate it to the binomial models?

Best Answer

Justification of the link function: A link function $g(\mu): (0,1) \rightarrow \mathbb{R}$ assures that all fitted values $\hat \mu = g^{-1}(x^\top \hat \beta)$ are always in $(0, 1)$. This may not matter that much in some applications, e.g., because the predictions or only evaluated in-sample or are not too close to 0 or 1. But it may matter in some applications and you typically do not know in advance whether it matters or not. Typical problems I have seen include: evaluating predictions new $x$ values that are (slightly) outside the range of the original learning sample or finding suitable starting values. For the latter consider:

library("betareg")
data("GasolineYield", package = "betareg")
betareg(yield ~ batch + temp, data = GasolineYield, link = make.link("identity"))
## Error in optim(par = start, fn = loglikfun, gr = if (temporary_control$use_gradient) gradfun else NULL,  : 
##   initial value in 'vmmin' is not finite

But, of course, one can simply try both options and see whether problems with the identity link occur and/or whether it improves the fit of the model.

Interpretation of the parameters: I agree that interpreting parameters in models with link functions is more difficult than in models with an identity link and practitioners often get it wrong. However, I have also often seen misinterpretations of the parameters in linear probability models (binary regressions with identity link, typically by least squares). The assumption that marginal effects are constant cannot hold if predictions get close enough to 0 or 1 and one would need to be really careful. E.g., for an observation with $\hat \mu = 0.01$ an increase in $x$ cannot lead to a decrease of $\hat \mu$ of, say, $0.02$. But this is often treated very sloppily in those scenarios. Hence, I would argue that for a limited response model the parameters from any link function need to be interpreted carefully and might need some practice. My usual advice is therefore (as shown in the other discussion you linked in your question) to look at the effects for regressor configurations of interest. These are easier to interpret and often (but not always) rather similar (from a practical perspective) for different link functions.

Related Question