Solved – Logit transformation or beta regression for proportion data

beta-regressionlogitregression

I'm interested in knowing about the difference in interpretation between (1) linear regression on a logit transformed variable with values between 0 and 1 and (2) beta regression where the values between 0 and 1 are untransformed.

I'm reading a following paper about the use of beta regression:

https://cran.r-project.org/web/packages/betareg/vignettes/betareg.pdf

Specifically, I'm trying to figure out how my interpretation of my results will be different if I take a percentage outcome variable I have and either (1) use the logit transformation and use a normal model or (2) use beta regression. This is what the authors have to say on the matter:

"How should one perform a regression analysis in which the dependent variable (or response
variable), y, assumes values in the standard unit interval (0, 1)? The usual practice used to be
to transform the data so that the transformed response, say ˜y, assumes values in the real line
and then apply a standard linear regression analysis. A commonly used transformation is the
logit, ˜y = log(y/(1 − y)). This approach, nonetheless, has shortcomings. First, the regression
parameters are interpretable in terms of the mean of ˜y, and not in terms of the mean of y
(given Jensen’s inequality)."

Could somebody give me a less technical explanation of the author's point here? I'm not really sure what Jensen's inequality is or why it applies here.

Here's another paper that makes a similar point:

https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.6179

They say:

"The logistic-normal model in
[5], which assumes normal distribution for logit-transformed proportion responses, can provide a computationally
convenient framework, but it suffers from an interpretation problem given that the expected
value of response is not a simple logit function of the covariates."

I think this quote is probably referring to the issue identified in the first one but I'm still not quite grasping how.

This issue issue is closed. See the comments on the first response for the answer.

Best Answer

They mean that once you transformed your dependent variable (e.g., from $y$ to ${\rm logit}(y)$), the parameters of the regression model tell you how independent variables affect ${\rm logit}(y)$, not $y$ itself.

Suppose sex is one of your independent variables and you see a coefficient of 2 for males against females.

If you used logit transformation, interpretation of this would be that being a male doubles a logit. If you did not, you can say that it doubles a percentage.

EDIT: Beta regression use logit to transform a mean of distribution assumed for data (beta distribution in this case) while linear regression with logit-transformed dependent variable transforms a data.

So in beta regression we have ${\rm logit}(E(y))$ modeled while in linear regression with logit-transformed dependent variable we have $E({\rm logit}(y))$. These two are not the same.