Beta Regression – Why Beta Regression Cannot Handle 0s and 1s in the Response Variable

beta distributionbeta-regressiongeneralized linear modelregressionzero inflation

Beta regression (i.e. GLM with beta distribution and usually the logit link function) is often recommended to deal with response aka dependent variable taking values between 0 and 1, such as fractions, ratios, or probabilities: Regression for an outcome (ratio or fraction) between 0 and 1.

However, it is always claimed that beta regression cannot be used as soon as the response variable equals 0 or 1 at least once. If it does, one needs to either use zero/one-inflated beta model, or make some transformation of the response, etc.: Beta regression of proportion data including 1 and 0.

My question is: which property of beta distribution prevents beta regression from dealing with exact 0s and 1s, and why?

I am guessing it is that $0$ and $1$ are not in the support of beta distribution. But for all shape parameters $\alpha>1$ and $\beta>1$, both zero and one are in the support of beta distribution, it's only for smaller shape parameters that the distribution goes to infinity at one or both sides. And perhaps the sample data are such that $\alpha$ and $\beta$ providing best fit would both turn out to be above $1$.

Does it mean that in some cases one could in fact use beta regression even with zeros/ones?

Of course even when 0 and 1 are in the support of beta distribution, probability of observing exactly 0 or 1 is zero. But so is the probability to observe any other given countable set of values, so this cannot be an issue, can it? (Cf. this comment by @Glen_b).

$\hskip{8em}$beta distribution

In the context of beta regression, beta distribution is parameterized differently, but with $\phi=\alpha+\beta>2$ it should still be well-defined on $[0,1]$ for all $\mu$.

enter image description here

Best Answer

Because the loglikelihood contains both $\log(x)$ and $\log(1-x)$, which are unbounded when $x=0$ or $x=1$. See equation (4) of Smithson & Verkuilen, "A Better Lemon Squeezer? Maximum-Likelihood Regression With Beta-Distributed Dependent Variables" (direct link to PDF).