Solved – Why are residual plots constructed using the residuals vs the predicted values

data visualizationregressionresiduals

I am interested to know why residual plots are plotted with residuals against predicted variable of y and not against y?

Best Answer

The standard (OLS) linear regression model is:
$$ Y = \beta_0 + \beta_1X + \varepsilon \\ \text{where }\ \varepsilon \sim \mathcal N(0, \sigma^2) $$ The important thing to recognize here is that the error term is normally distributed with variance that does not depend on $X$. Since $\hat Y = \beta_0 + \beta_1X$, the residuals1 of our model can be used as estimates of the errors of the data generating process, and we can inspect the plot of the residuals vs. the fitted values to assess the assumption of constant variance (homoscedasticity). To understand this more fully, it may help to read my answer here: What does having “constant variance” in a linear regression model mean? On the other hand, it is not clear what a plot of the residuals vs. the raw $Y$ values would illustrate. In fact, we generally expect some degree of correlation between the residuals and $Y$. (It may help to read this excellent CV thread: What is the expected correlation between residual and the dependent variable?)

In addition, the plot of residuals vs fitted values can be used to help identify a misspecified functional form2. Again, since we expect the residuals and $Y$ to be correlated, the plot of the residuals vs. the raw $Y$ values will be misleading on this issue.

Using the code and data from my answer linked above, consider these four plots:

enter image description here

Neither model is misspecified, but the model represented in the right two plots does have heteroscedasticity. The top plots help you identify the clear heteroscedasticity on the right, without leading you to worry about a possible misspecified functional form. The plots on the bottom incorrectly connote misspecification, and do so more strongly than they inform us of the status of the constant variance assumption.

1. We actually use the standardized residuals here.
2. This becomes more difficult with increasing numbers of $X$ variables, though.