Proof that the plots of the fitted values vs. residuals yields parabola when needed quadratic term omitted

linear regressionstatistics

There is a step I do not understand in my professor's class. In his slide, he is trying to formally show us that when the true, unknown model contains a squared predictor, and that this predictor is omitted when estimating the model parameters, then the plots of a) the fitted values vs. the residuals and b) the predictor vs. the residuals will yield a parabolic outcome. It is the last step [from (1) to (2)] of his reasoning that I simply don't understand.

He proceeds as follows:

"Assume for instance that the true model is quadradic in $X_1$:
$$y_i=\beta_0+\beta_1x_{i1}+\beta_{11}x_{i1}^2+\epsilon_i$$
but we fit only a linear model
$$\hat y_i=\hat \beta_0+ \hat \beta_1x_{i1}$$
Then,
\begin{align}
e_i & =y_i-\hat y_i \\
& = (\beta_0-\hat \beta_0)+(\beta_1-\hat \beta_1)x_{i1}+\beta_{11}x_{i1}^2+\epsilon_i \qquad (1)\\
& = \gamma_0+\gamma_1(\hat \beta_0 +\hat \beta_1x_{i1})+\gamma_2(\hat \beta_0 +\hat \beta_1x_{i1})^2+\epsilon_i \qquad (2)\\
& = \gamma_0+\gamma_1\hat y_i+\gamma_2\hat y_i^2+\epsilon_i
\end{align}

for certain values of $\gamma_0$, $\gamma_1$ and $\gamma_2$. This shows that both a $(x_{i1},e_i)$ and a $(\hat y_i,e_i)$ plot will show a quadratic curve."

As mentioned above, does anyone see the manipulation he performed to deduce (2) from (1)? How did he rid himself of the "true", unknown $\beta$'s?

I tried to consult a textbook (Sheather's "Modern Approach") to find this out, but his way of thinking about this is rather simplistic. They simply assume that the least squares estimates are close to the unknown population parameters $\beta_0$ and $\beta_1$, and that therefore we find that
$e_1=y_i-\hat y_i=(\beta_0-\hat \beta_0)+(\beta_1-\hat \beta_1)x_{i1}+\beta_{11}x_{i1}^2+\epsilon_i\approx\beta_{11}x_{i1}^2+\epsilon_i$. I'm quite unsatisfied with this simplistic assumption and it doesn't answer my question above.

Can anyone help me out? If it is simple algebra, sincere apologies!

Many thanks in advance for your help

Best Answer

What you needs to use to get between (1) and (2) is

  • $\gamma_0 = \dfrac{\hat\beta_{0}^2\beta_{11}-\hat\beta_{0}\hat\beta_{1}\beta_{1}+\hat\beta_{1}^2\beta_{0}}{\hat\beta_{1}^2}$
  • $\gamma_1 = \dfrac{-2\hat\beta_{0}\beta_{11}+\hat\beta_{1}\beta_{1}-\hat\beta_{1}^2}{\hat\beta_{1}^2}$
  • $\gamma_2 = \dfrac{\beta_{11}}{\hat\beta_{1}^2}$

which works so long as $\hat\beta_{1} \not =0$

But this is not quite the real issue. Here is an illustration of trying to fit a straight line to something which is actually a parabola plus some noise, using R:

set.seed(2020)
truebeta0 <- 4
truebeta1 <- 7
truebeta2 <- 2
sdnoise   <- 3
X <- 1:10
Xsq <- X^2
Y <- truebeta0 + truebeta1 * X + truebeta2 * Xsq + rnorm(10,0,sdnoise)
fit1 <- lm(Y ~ X + Xsq)
fit2 <- lm(Y ~ X)
plot(Y ~ X)
points(fit1$fitted.values ~ X, type="l", col = "blue")
points(fit2$fitted.values ~ X, type="l", col = "red")

to give

enter image description here

and you can see the blue fitted parabola is a good fit with small apparently random residuals, while the red fitted straight line is not so good, with the residuals larger and those in the middle the opposite sign to those at the extremes. If you look at the residuals for the red line, they are indeed visually a parabola plus some noise:

plot(fit2$residuals ~ fit2$fitted.values, type="p", col = "red")
abline(h=0)

enter image description here

Related Question