Is it ok to run a beta regression with proportion data (as the y variable) and categorical predictor variables? i.e., I know the R etc. will often do the conversion for you, but I just want to make sure the results are still ok/valid?
All of the documentation that I can find says continuous predictor variables.
Best Answer
Beta regression is based on a linear predictor for the expectation of a beta-distributed response variable, by default with a logit-link: $\mathrm{logit}(\mu_i) = x_i^\top \beta$. Often, a second linear predictor is added for the precision parameter $\log(\phi_i) = z_i^\top \gamma$. Thus, all the usual strategies to build a linear predictor from a set of originally measured variables can be applied, e.g., polynomials, splines, interactions, etc. And for categorical variables the usual way to do this is via contrasts, see e.g., https://stackoverflow.com/questions/2352617/how-and-why-do-you-use-contrasts
In R
model.matrix()
is used internally to set up the regressor matrix usingcontrasts()
. This is employed inlm()
,glm()
, and also inbetareg()
(among many many other packages). Many practitioners do not change the default contrasts but two examples invignette("betareg", package = "betareg")
actually do. InGasolineYield
thebatch
uses the default treatment contrasts but with batch 10 (rather than 1) as the reference category. And inReadingSkills
thedyslexia
variable uses sum contrasts for an effect coding. (See also How to do regression with effect coding instead of dummy coding in R?)