Solved – Residual plots: why plot versus fitted values, not observed $Y$ values

regressionresiduals

In the context of OLS regression I understand that a residual plot (vs fitted values) is conventionally viewed to test for constant variance and assess model specification. Why are the residuals plotted against the fits, and not the $Y$ values? How is the information different from these two plots?

I am working on a model that produced the following residual plots:

enter image description here

enter image description here

So the plot vs the fitted values looks good at quick glance, but the second plot against the $Y$ value has a pattern. I'm wondering why such a pronounced pattern wouldn't also manifest in the residual vs fit plot….

I'm not looking for help in diagnosing issues with the model, but just trying to understand the differences (generally) between (1) residual vs fit plot & (2) residual vs $Y$ plot.

For what it's worth, I'm sure the error pattern in the second chart is due to omitted variable(s) which influence the DV. I'm currently working on obtaining that data, which I expect will help the overall fit and specification. I am working with real estate data: DV=Sales Price. IVs: Sq.ft of house, # garage spaces, year built, year built$^2$.

Best Answer

By construction the error term in an OLS model is uncorrelated with the observed values of the X covariates. This will always be true for the observed data even if the model is yielding biased estimates that do not reflect the true values of a parameter because an assumption of the model is violated (like an omitted variable problem or a problem with reverse causality). The predicted values are entirely a function of these covariates so they are also uncorrelated with the error term. Thus, when you plot residuals against predicted values they should always look random because they are indeed uncorrelated by construction of the estimator. In contrast, it's entirely possible (and indeed probable) for a model's error term to be correlated with Y in practice. For example, with a dichotomous X variable the further the true Y is from either E(Y | X = 1) or E(Y | X = 0) then the larger the residual will be. Here is the same intuition with simulated data in R where we know the model is unbiased because we control the data generating process:

rm(list=ls())
set.seed(21391209)

trueSd <- 10
trueA <- 5
trueB <- as.matrix(c(3,5,-1,0))
sampleSize <- 100

# create independent x-values
x1 <- rnorm(n=sampleSize, mean = 0, sd = 4)
x2 <-  rnorm(n=sampleSize, mean = 5, sd = 10)
x3 <- 3 + x1 * 4 + x2 * 2 + rnorm(n=sampleSize, mean = 0, sd = 10)
x4 <- -50 + x1 * 7 + x2 * .5 + x3 * 2  + rnorm(n=sampleSize, mean = 0, sd = 20)
X = as.matrix(cbind(x1,x2,x3,x4))


# create dependent values according to a + bx + N(0,sd)
Y <-  trueA +  X %*%  trueB  +rnorm(n=sampleSize,mean=0,sd=trueSd)


df = as.data.frame(cbind(Y,X))
colnames(df) <- c("y", "x1", "x2", "x3", "x4")
ols = lm(y~x1+x2+x3+x4, data = df)
y_hat = predict(ols, df)
error = Y - y_hat
cor(y_hat, error) #Zero
cor(Y, error) #Not Zero

We get the same result of zero correlation with a biased model, for example if we omit x1.

ols2 = lm(y~x2+x3+x4, data = df)
y_hat2 = predict(ols2, df)
error2 = Y - y_hat2
cor(y_hat2, error2) #Still zero
cor(Y, error2) #Not Zero