Regression – Does Normal Errors Assumption Imply Y is Also Normal?

assumptionsregression

Unless I'm mistaken, in a linear model, the distribution of the response is assumed to have a systematic component and a random component. The error term captures the random component. Therefore, if we assume that the error term is Normally distributed, doesn't that imply that the response is also Normally distributed? I think it does, but then statements such as the one below seem rather confusing:

And you can see clearly that the only assumption of "normality" in
this model is that the residuals (or "errors" $\epsilon_i$) should be
normally distributed. There is no assumption about the distribution of
the predictor $x_i$ or the response variable $y_i$
.

Source: Predictors, responses and residuals: What really needs to be normally distributed?

Best Answer

The standard OLS model is $Y = X \beta + \varepsilon$ with $\varepsilon \sim \mathcal N(\vec 0, \sigma^2 I_n)$ for a fixed $X \in \mathbb R^{n \times p}$.

This does indeed mean that $Y|\{X, \beta, \sigma^2\} \sim \mathcal N(X\beta, \sigma^2 I_n)$, although this is a consequence of our assumption on the distribution of $\varepsilon$, rather than actually being the assumption. Also keep in mind that I'm talking about the conditional distribution of $Y$, not the marginal distribution of $Y$. I'm focusing on the conditional distribution because I think that's what you're really asking about.

I think the part that is confusing is that this doesn't mean that a histogram of $Y$ will look normal. We are saying that the entire vector $Y$ is a single draw from a multivariate normal distribution where each element has a potentially different mean $E(Y_i|X_i) = X_i^T\beta$. This is not the same as being an iid normal sample. The errors $\varepsilon$ actually are an iid sample so a histogram of them would look normal (and that's why we do a QQ plot of the residuals, not the response).

Here's an example: suppose we are measuring height $H$ for a sample of 6th graders and 12th graders. Our model is $H_i = \beta_0 + \beta_1I(\text{12th grader}) + \varepsilon_i$ with $\varepsilon_i \sim \ \text{iid} \ \mathcal N(0, \sigma^2)$. If we look at a histogram of the $H_i$ we'll probably see a bimodal distribution, with one peak for 6th graders and one peak for 12th graders, but that doesn't represent a violation of our assumptions.

Related Question