Minimizing RSS for model with missing observations. Dumthe variable vs Dropping observations

least squaresstatistics

Suppose that the relationship: $Y_i = β_1 + β_2*X_i + u_i$ is being fitted and that the value of X is missing for some observations. One way of handling the missing values problem is to drop those observations. Another is to set X = 0 for the missing observations and include a dummy variable D defined to be equal to 1 if X is missing, 0 otherwise. Demonstrate that the two methods must yield the same estimates of β_1 and β_2. Write down an expression for RSS using the second approach, decompose it into the RSS for observations with X present and RSS for observations with X missing, and determine how the resulting expression is related to RSS when the missing value observations are dropped.

My attempt: Suppose we have n total observations and the first k of them are not missing, and all the rest are missing. So we have model 1 where missing x_i will be 0 and model 2 where d_i is 1 if x_i is missing, and 0 otherwise:
enter image description here

We can see that in RSS_1 and RSS_2 second terms are the same. And the textbook, from which I took this task says that β_1 and β_2 which are minimizing RSS for two models will be the same. Namely, this will be β_1 and β_2 which could be obtained by minimizing the first term of RSS_2, but I have no idea why. Is this because the first term of RSS_1 does not depend on x_i?

Best Answer

In model $2$, let's call $\bar{y}_m$ the mean of the $y_i$ for which $x_i$ is missing, and similarly $RSS_m$ their residual sum of squares, while $RSS_p$ is the residual sum of squares where $x_i$ is present. Clearly $RSS_2=RSS_p+RSS_m$.

  • $RSS_p=\sum\limits_\text{present} (y_i - \beta_1-\beta_2 x_i)^2$ and this is minimised when equal to $RSS_1$. We can achieve by setting $\beta_1$ and $\beta_2$ as in model $1$
  • $RSS_m=\sum\limits_\text{missing} (y_i - \beta_1-\beta_3)^2$ and this is minimised when $\bar{y}_m = \beta_1+\beta_3$. We can achieve this minimum for any $\beta_1$ by setting $\beta_3 = \bar{y}_m-\beta_1$

Since the minimum of a sum is at least the sum of the minima:

  • $RSS_2=RSS_p+RSS_m$ is minimised in model $2$ by using the $\beta_1$ and $\beta_2$ found in model $1$ together with $\beta_3 = \bar{y}_m-\beta_1$,