Categorical Data – How to Choose Reference Level for Multiple Categorical Variables

categorical dataleast squaresregression

For an OLS model with one categorical variable as predictor, we often make dummies. One of the dummies – if the no. of categories is more than 2 is allowed to be subsumed by the intercept of the model. Such variable is taken for a reference level. It has been thoroughly explained, how to interpret such a reference level in a part of this answer. But, let's say that we have multiple categorical variables each having more than 2 categories. We make dummies out of all and use a dummy from each variable as a reference. How in this case must the parameters and intercept of the model be interpreted? Put differently (rather bluntly), what does the reference refer now?

Best Answer

How in this case must the parameters and intercept of the model be interpreted?

It's just a generalization of the situation with a single multi-category predictor.

The intercept is the estimate when all such predictors are at their reference levels. The regression coefficients for other levels of those predictors are just the associated differences from what is estimated at the corresponding reference level.