Solved – Dumthe variables in multiple regression, why use an intercept

anovacategorical datacategorical-encodingmultiple regressionregression

When performing a multiple regression with dummy variables, is it really necessary to include an intercept term in the design matrix?

By dummy variables, I mean indicator variables; a one in the design matrix if some effect is present, and a zero if not. It seems to me that without the intercept it is simpler to interpret the OLS solution. Instead of

$\beta_{0}$ = $\mu_{A}$ (where $\beta_{0}$ is the intercept)

$\beta_{1}$ = $\mu_{B} – \mu_{A}$

$\beta_{2}$ = $\mu_{C} – \mu_{A}$

etc.

We have

$\beta_{1}$ = $\mu_{A}$

$\beta_{2}$ = $\mu_{B}$

$\beta_{3}$ = $\mu_{C}$

etc.

Do the computations of $R^{2}$, the F-statistic and t-statistics change?

What if a continuous independent variable is then included?

Best Answer

Things like the predictions, residuals, full-reduced model tests, etc. will not be affected by the change that you propose, but what does change is the interpretation and tests on the individual terms.

Most regression routines will provide an automatic test of whether a term is 0 or not. This is meaningful when a term represents the difference between two group means (what we get when we include an intercept), but is testing whether each of the group means equals 0 meaningful? The same goes for the confidence intervals and we usually want to know if groups differ from each other. If every term just represents a mean then we compute confidence intervals for the means, then people try to interpret the amount of a difference by seeing if the intervals overlap, but this is very inferior to looking at a confidence interval on a difference.

Related Question