Solved – Beta regression with categorical predictor variable

beta-regressionregression

Is it ok to run a beta regression with proportion data (as the y variable) and categorical predictor variables? i.e., I know the R etc. will often do the conversion for you, but I just want to make sure the results are still ok/valid?

All of the documentation that I can find says continuous predictor variables.

Best Answer

Beta regression is based on a linear predictor for the expectation of a beta-distributed response variable, by default with a logit-link: $\mathrm{logit}(\mu_i) = x_i^\top \beta$. Often, a second linear predictor is added for the precision parameter $\log(\phi_i) = z_i^\top \gamma$. Thus, all the usual strategies to build a linear predictor from a set of originally measured variables can be applied, e.g., polynomials, splines, interactions, etc. And for categorical variables the usual way to do this is via contrasts, see e.g., https://stackoverflow.com/questions/2352617/how-and-why-do-you-use-contrasts

In R model.matrix() is used internally to set up the regressor matrix using contrasts(). This is employed in lm(), glm(), and also in betareg() (among many many other packages). Many practitioners do not change the default contrasts but two examples in vignette("betareg", package = "betareg") actually do. In GasolineYield the batch uses the default treatment contrasts but with batch 10 (rather than 1) as the reference category. And in ReadingSkills the dyslexia variable uses sum contrasts for an effect coding. (See also How to do regression with effect coding instead of dummy coding in R?)