Regression – Trend in Residuals vs Dependent But Not in Residuals vs Fitted

regressionresiduals

I am fitting a linear model to a problem, and a little confused by what is going on. Without the details here are the two plots confusing me:

Residuals vs Fitted

Residuals vs Fitted

Residuals vs Y

Residuals vs Y

Now the residuals vs fitted looks good to me. Fairly evenly dispersed, no clear pattern. However, the $y$ vs fitted do not look good. I would have expected there to also not be a clear trend in this relationship.

This seems basic, but I'm rather lost. I don't want to fit a model that allows large positive errors at high values of $Y$ to make up for large negative errors at low values of $Y$.

Is this just an indication of a poorly fitting model? Or is something else going on here?

EDIT: Asked for fitted vs y (changed to have labels)

fitted vs y

EDIT2: Just want to point out that this question has already been asked and answered (albeit in a more abstract sense) What is the expected correlation between residual and the dependent variable? .

Best Answer

1) The residuals and the fitted are uncorrelated by construction. In fact if there was any correlation between them, there would be uncaptured linear trend in the data - we could get a closer fit by changing the coefficients until they were uncorrelated.

2) The residuals and the y-variable are always positively correlated. This is a necessary consequence of (1).

$$cov(e,y) = cov(e,e+\hat y) = \sigma^2+0 = \sigma^2$$

So it would be surprising if there wasn't a trend in that first plot.


Consider a simulated example -

Note that by plotting residuals against observed, it's equivalent to using slanted axes in the residuals vs fitted plot:

enter image description here

The reason for the high observed (the grey slanted lines mark constant observed, the ones to the far right are high) being associated with high residuals is clear here, as it the reason for them being only positive near the end.