Solved – Why use the logit link in beta regression

beta-regressionlogit

Recently, I have been interested in implementing a beta regression model, for an outcome that is a proportion. Note that this outcome would not fit into a binomial context, because there is no meaningful concept of a discrete "success" in this context. In fact, the outcome is actually a proportion of durations; the numerator being the number of seconds while a certain condition is active over the total number of seconds during which the condition was eligible to be active. I apologize for the vagaries, but I don't want to focus too much on this precise context, because I realize there are a variety of ways such a process could be modeled besides beta regression, and for now I am more interested specifically in theoretical questions that have arisen in my attempts to implement such a model (though I am, of course, open to any suggestions pointing me towards interesting alternative modeling strategies if you believe a beta regression is inappropriate entirely).

In any case, all of the resources I have been able to find have indicated that beta regression is typically fit using a logit (or probit/cloglog) link, and the parameters interpreted as changes in log-odds. However, I have yet to find a reference that actually provides any real justification for why one would want to use this link.

The original Ferrari & Cribari-Neto (2004) paper doesn't provide a justification; they note only that the logit function is "particularly useful", due to the odds ratio interpretation of the exponentiated parameters. Other sources allude to a desire to map from the interval (0,1) to the real line. However, do we necessarily need a link function for such a mapping, given that we are already assuming a beta distribution? What benefits does the link function provide above and beyond the constraints imposed by assuming the beta distribution to begin with? I've run a couple of quick simulations and haven't seen predictions outside the (0,1) interval with an identity link, even when simulating from beta distributions whose probability mass is largely bunched close to 0 or 1, but perhaps my simulations haven't been general enough to catch some of the pathologies.

It seems to me based on how individuals, in practice, interpret the parameter estimates from beta regression models (i.e. as odds ratios) that they are implicitly making inference with respect to the odds of a "success"; that is, they are using beta regression as a substitute for a binomial model. Perhaps this is appropriate in some contexts, given the relationship between beta and binomial distributions, but it seems to me that this should be more of a special case than the general one. In this question, an answer is provided for interpreting the odds ratio with respect to the continuous proportion rather than the outcome, but it seems to me to be unnecessarily cumbersome to try and interpret things this way, as opposed to using, say, a log or identity link and interpreting % changes or unit-shifts.

So, why do we use the logit link for beta regression models? Is it simply as a matter of convenience, to relate it to the binomial models?

Best Answer

Justification of the link function: A link function $g(\mu): (0,1) \rightarrow \mathbb{R}$ assures that all fitted values $\hat \mu = g^{-1}(x^\top \hat \beta)$ are always in $(0, 1)$. This may not matter that much in some applications, e.g., because the predictions or only evaluated in-sample or are not too close to 0 or 1. But it may matter in some applications and you typically do not know in advance whether it matters or not. Typical problems I have seen include: evaluating predictions new $x$ values that are (slightly) outside the range of the original learning sample or finding suitable starting values. For the latter consider:

library("betareg")
data("GasolineYield", package = "betareg")
betareg(yield ~ batch + temp, data = GasolineYield, link = make.link("identity"))
## Error in optim(par = start, fn = loglikfun, gr = if (temporary_control$use_gradient) gradfun else NULL,  : 
##   initial value in 'vmmin' is not finite

But, of course, one can simply try both options and see whether problems with the identity link occur and/or whether it improves the fit of the model.

Interpretation of the parameters: I agree that interpreting parameters in models with link functions is more difficult than in models with an identity link and practitioners often get it wrong. However, I have also often seen misinterpretations of the parameters in linear probability models (binary regressions with identity link, typically by least squares). The assumption that marginal effects are constant cannot hold if predictions get close enough to 0 or 1 and one would need to be really careful. E.g., for an observation with $\hat \mu = 0.01$ an increase in $x$ cannot lead to a decrease of $\hat \mu$ of, say, $0.02$. But this is often treated very sloppily in those scenarios. Hence, I would argue that for a limited response model the parameters from any link function need to be interpreted carefully and might need some practice. My usual advice is therefore (as shown in the other discussion you linked in your question) to look at the effects for regressor configurations of interest. These are easier to interpret and often (but not always) rather similar (from a practical perspective) for different link functions.

Related Solutions

Solved – Logit transformation or beta regression for proportion data

They mean that once you transformed your dependent variable (e.g., from $y$ to ${\rm logit}(y)$), the parameters of the regression model tell you how independent variables affect ${\rm logit}(y)$, not $y$ itself.

Suppose sex is one of your independent variables and you see a coefficient of 2 for males against females.

If you used logit transformation, interpretation of this would be that being a male doubles a logit. If you did not, you can say that it doubles a percentage.

EDIT: Beta regression use logit to transform a mean of distribution assumed for data (beta distribution in this case) while linear regression with logit-transformed dependent variable transforms a data.

So in beta regression we have ${\rm logit}(E(y))$ modeled while in linear regression with logit-transformed dependent variable we have $E({\rm logit}(y))$. These two are not the same.

Solved – Beta regression – interpret coefficients using loglog link

As discussed by @StatsStudent and in the comments: There is no simple and intuitive ceteris paribus interpretation for log-log links. The easiest link that still assures predictions are in $(0, 1)$ is the logit link, see: interpretation of betareg coef However, even in that case it takes some practice to quickly process the meaning of coefficients.

Hence, in general I recommend to complement other analyses by looking at predictions and discrete changes for regressor combinations of interest. I typically set up some new dummy data set that contains combinations of regressor values that I'm interest in and then I look at predictions, e.g., of means, variances, medians, or other quantiles.

As a simple example, consider your artificial data:

d <- data.frame(
  x1 = c(0.051, 0.049, 0.046, 0.042, 0.042, 0.041, 0.038, 0.037, 0.043, 0.031),
  x2 = c(0.11, 0.12, 0.09, 0.21, 0.18, 0.11, 0.13, 0.11, 0.08, 0.10),
  y  = c(0.97, 0.87, 0.77, 0.65, 0.77, 0.84, 0.76, 0.73, 0.82, 0.90)
)
m <- betareg(y ~ x1 + x2, data = d, link = "loglog")

Then, we create a new dummy data set that fixed x1 at its mean and lets x2 vary across its range:

nd <- data.frame(x1 = 0.042, x2 = 8:21/100)

To this data set we can then add the predicted means which show what a 0.01 unit change in x2 does:

nd$mean <- predict(m, nd, type = "response")
nd
##       x1   x2      mean
## 1  0.042 0.08 0.8671101
## 2  0.042 0.09 0.8571699
## 3  0.042 0.10 0.8465540
## 4  0.042 0.11 0.8352276
## 5  0.042 0.12 0.8231556
## 6  0.042 0.13 0.8103037
## 7  0.042 0.14 0.7966381
## 8  0.042 0.15 0.7821265
## 9  0.042 0.16 0.7667387
## 10 0.042 0.17 0.7504468
## 11 0.042 0.18 0.7332267
## 12 0.042 0.19 0.7150583
## 13 0.042 0.20 0.6959266
## 14 0.042 0.21 0.6758232

Clearly the effect of a 0.01 unit change in x2 leads to different predicted changes in the expectation of y:

summary(diff(nd$mean))
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.02010 -0.01722 -0.01451 -0.01471 -0.01207 -0.00994

The changes can also be brought out graphically. The code below shows the mean (solid) along with the corresponding 5%, 50%, and 95% quantile (dashed) of the predicted beta distribution. Also, the observations from d are added:

plot(mean ~ x2, data = nd, type = "l")
lines(nd$x2, predict(m, nd, type = "quantile", at = 0.5), lty = 2)
lines(nd$x2, predict(m, nd, type = "quantile", at = 0.05), lty = 2)
lines(nd$x2, predict(m, nd, type = "quantile", at = 0.95), lty = 2)
points(y ~ x2, data = d)

Note, however, that in the actual data d the variable x1 varies along with x2 while in the new dummy data nd the variable x1 is fixed. More generally plotting something like partial residuals would be better than actual observations.

A more formal way of looking at such "effects" displays is provided in packages effects (see http://doi.org/10.18637/jss.v087.i09 and the earlier references therein) or lsmeans (see https://doi.org/10.18637/jss.v069.i01).

Best Answer

Related Solutions

Solved – Logit transformation or beta regression for proportion data

Solved – Beta regression – interpret coefficients using loglog link

Related Question