GLM Regression – Setting Reference Levels in Generalized Linear Model Regression

categorical-encodinggeneralized linear modelreferencesregression

In GLM regression I have always been told to set the reference level of categorical/ordinal/dummy variables to the level with the most exposure (level with most data), because this somehow makes the model more stable. Is there any statistical reason behind this, or just historical practice without statistical reason?

I can understand that one wants a "baselevel" (reference level) in the model that are most likely to be correct, but if you are not sure about the other levels, then they surely should not be included in the regression in the first place?

Best Answer

I would mostly choose as reference level one which gives meaning in the applied context, that is, a reference level that actually is interesting as a reference in the application. So, in an experiment with several treatments and one control, I would choose the control as the reference level, in a marketing context with many product, I would choose a market leader as reference (or If I am an interested party, my own product.)

But, if some levels have very few observations, using such a level as a reference will lead to all the estimated contrasts$^\dagger$ having a large standard deviation, which is a difficulty for interpretation. So then some compromise must be made.

But what you have been told:

because this somehow makes the model more stable

is not true. Irrespective which level you choose as a reference, the model being estimated is the same, and will be stable or unstable the same. Choice of reference level only is a help for interpretation, not for numerical issues. And whatever contrasts you are most interested in, can always be computed after the fit, it is just convenient if we can read it directly off the standard output.

$^\dagger\colon$ When using treatment contrasts/treatment coding, all the estimated parameters are really contrast comparing level $j$ to the reference level.

Related Question