Solved – Interpreting the residuals vs. fitted values plot for verifying the assumptions of a linear model

assumptionsregressionresiduals

Consider the following figure from Faraway's Linear Models with R (2005, p. 59).

enter image description here

The first plot seems to indicate that the residuals and the fitted values are uncorrelated, as they should be in a homoscedastic linear model with normally distributed errors. Therefore, the second and third plots, which seem to indicate dependency between the residuals and the fitted values, suggest a different model.

But why does the second plot suggest, as Faraway notes, a heteroscedastic linear model, while the third plot suggest a non-linear model?

The second plot seems to indicate that the absolute value of the residuals is strongly positively correlated with the fitted values, whereas no such trend is evident in the third plot. So if it were the case that, theoretically speaking, in a heteroscedastic linear model with normally distributed errors

$$
\mbox{Cor}\left(\mathbf{e},\hat{\mathbf{y}}\right) = \left[\begin{array}{ccc}1 & \cdots & 1 \\ \vdots & \ddots & \vdots \\ 1 & \cdots & 1\end{array}\right]
$$

(where the expression on the left is the variance-covariance matrix between the residuals and the fitted values) this would explain why the second and third plots agree with Faraway's interpretations.

But is this the case? If not, how else can Faraway's interpretations of the second and third plots be justified? Also, why does the third plot necessarily indicate non-linearity? Isn't it possible that it is linear, but that the errors are either not normally distributed, or else that they are normally distributed, but do not center around zero?

Best Answer

Below are those residual plots with the approximate mean and spread of points (limits that include most of the values) at each value of fitted (and hence of $x$) marked in - to a rough approximation indicating the conditional mean (red) and conditional mean $\pm$ (roughly!) twice the conditional standard deviation (purple):

diagnostic plots with approximate mean and spread at each value of fitted marked in

  • The second plot shows the mean residual doesn't change with the fitted values (and so is doesn't change with $x$), but the spread of the residuals (and hence of the $y$'s about the fitted line) is increasing as the fitted values (or $x$) changes. That is, the spread is not constant. Heteroskedasticity.

  • the third plot shows that the residuals are mostly negative when the fitted value is small, positive when the fitted value is in the middle and negative when the fitted value is large. That is, the spread is approximately constant, but the conditional mean is not - the fitted line doesn't describe how $y$ behaves as $x$ changes, since the relationship is curved.

Isn't it possible that it is linear, but that the errors are either not normally distributed, or else that they are normally distributed, but do not center around zero?

Not really*, in those situations the plots look different to the third plot.

(i) If the errors were normal but not centered at zero, but at $\theta$, say, then the intercept would pick up the mean error, and so the estimated intercept would be an estimate of $\beta_0+\theta$ (that would be its expected value, but it is estimated with error). Consequently, your residuals would still have conditional mean zero, and so the plot would look like the first plot above.

(ii) If the errors are not normally distributed the pattern of dots might be densest somewhere other than the center line (if the data were skewed), say, but the local mean residual would still be near 0.

non-normal errors

Here the purple lines still represent a (very) roughly 95% interval, but it's no longer symmetric. (I'm glossing over a couple of issues to avoid obscuring the basic point here.)

* It's not necessarily impossible -- if you have an "error" term that doesn't really behave like errors - say where $x$ and $y$ are related to them in just the right way - you might be able to produce patterns something like these. However, we make assumptions about the error term, such as that it's not related to $x$, for example, and has zero mean; we'd have to break at least some of those sorts of assumptions to do it. (In many cases you may have reason to conclude that such effects should be absent or at least relatively small.)