Regression Residuals – Does It Make Sense to Study Plots of Residuals with Respect to the Dependent Variable?

regressionresiduals

I would like to know whether it makes sense to study the plots of residuals with respect to the dependent variable when I've got a univariate regression. If it makes sense, what does a strong, linear, growing correlation between residuals (on the y-axis) and the estimated values of the dependent variable (on the x-axis) mean?

enter image description here

Best Answer

Suppose that you have the regression $y_i = \beta_0 + \beta_1 x_i + \epsilon_i$, where $\beta_1 \approx 0$. Then, $y_i - \beta_0 \approx \epsilon_i$. The higher the $y$ value, the bigger the residual. On the contrary, a plot of the residuals against $x$ should show no systematic relationship. Also, the predicted value $\hat{y}_i$ should be approximately $\hat{\beta}_0$---the same for every observation. If all the predicted values are roughly the same, they should be uncorrelated with the errors.

What the plot is telling me is that $x$ and $y$ are essentially unrelated (of course, there are better ways to show this). Let us know if your coefficient $\hat{\beta}_1$ is not close to 0.

As better diagnostics, use a plot of the residuals against the predicted wage or against the $x$ value. You should not observe a distinguishable pattern in these plots.

If you want a little R demonstration, here you go:

y      <- rnorm(100, 0, 5)
x      <- rnorm(100, 0, 2)
res    <- lm(y ~ x)$residuals
fitted <- lm(y ~ x)$fitted.values
plot(y, res)
plot(x, res)
plot(fitted, res)