Solved – Weird fitted values/residuals plot

residualsscatterplot

I am doing some regressions on real earnings as a function of some different variables and this came out:

enter image description here

Is this because earnings cannot be negative?

Best Answer

You can answer the question for yourself with simple mathematics. If observed $y \ge 0$ and $\hat y$ denotes fitted $y$, then residual $e = y - \hat y$ must be $\ge -\hat y$. The line $e = - \hat y$ is thus a lower limit on your residuals. Despite your unconventional axis choice, it is clear that your data follow suit.

The underlying problem is presumably use of a standard linear model on data not suited to such. One way forward is a log-linear or Poisson(-like) model: (fortuitously but fortunately for the OP as a Stata user) there is a Stata-rich explanation in this blog posting. The posting should be of considerable interest to many users of statistics, however.

P.S. A standard residual plot has residuals on the vertical axis and fitted or predicted on the horizontal axis. The choice of axes is not here an arbitrary convention. A horizontal line indicating zero residuals is the natural reference line, as indicating behaviour matching a perfect model. As emphasised often by J.W. Tukey and others, the best references are linear, and the best linear references are horizontal, in the sense of being easiest to think about. In Stata there is a built-in post-estimation rvfplot for use after regress.

P.P.S. The graph flags a Stata user. Naturally use of Stata is quite secondary here to the main question.