Solved – How to have non-random patterns in the plot of simple linear regression residuals vs the predictor variable

diagnosticlinear modelregressionresiduals

A) When considering a simple linear regression model, it is important to check the linearity assumption. Graphing the residuals vs the predictor variable can often give a good idea of whether or not this is true. A non-random pattern suggests that a simple linear model is not appropriate.

B) On the other hand, residuals have the property that the correlation between the residuals and the observations of the predictor variable is zero.

It sounds to me that B) contradicts A). (If the residuals and the predictor are not correlated, how can we see a non-random pattern in the plot?) Can you explain to me why this is not the case?

Best Answer

Correlation refers to linear dependence. However, you can have non-linear dependencies. Here is the standard plot from the Wikipedia page on correlation and linear dependence:

enter image description here

The bivariate distributions in the bottom row all have zero correlation, but clear patterns. Thus, although standard (OLS) regression methods enforce a zero correlation between the residuals and the predicted values, there can still be a detectable pattern that indicates the functional form is mis-specified. Consider the following plot, taken from this CV question: How do I interpret this fitted vs residuals plot? As I argue in my answer there, it provides evidence of mis-specified functional form.

enter image description here