Suppose {$(x_i,y_i,z_i):i=1,2,…,n$} is a set of trivariate observations on three variables:$X,Y,Z$, where $z_i=0$ for $i=1,2,…,n-1$ and $z_n=1$.Suppose the least squares linear regression equation of $Y$ on $X$ based on the first $n-1$ observations is $y=\hat{\alpha_0}+\hat{\alpha_1}x$ and the least squares linear regression equation of $Y$ on $X$ and $Z$ based on all the $n$ observations is $y=\hat{\beta_0}+\hat{\beta_1}x+\hat{\beta_2}z$.
We need to show that $\hat{\alpha_1}=\hat{\beta_1}$.
My approach:
Based on the first $n-1$ observations, as $z_i=0$, so, we consider a typical linear regression model of $Y$ on $X$.
Thus,the least square estimate $\hat{\alpha_1}=\frac{\sum_{i=1}^{n-1} (x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^{n-1} (x_i-\bar{x})^2}$
And in the second case, we have:
$y_1=\beta_0+\beta_1 x_1+e_1$
$y_2=\beta_0+\beta_1 x_2+e_2$
.
.
.
$y_n=\beta_{0}+\beta_1 x_n+\beta_2+e_n$
Thus, the error sum of squares:
$=\sum_{i=1}^{n-1} (y_i-\beta_0-\beta_1 x_i)^2+(y_n-\beta_1 x_n -\beta_0 -\beta_2)^2$
Differentiating this w.r.t. $\beta_0,\beta_1,\beta_2$ and equating them to $0$, we get the same value of the estimate $\hat{\beta_1}$, as the normal equations for $\beta_0,\beta_1$ come out to be the same by plugging in $\hat{\beta_2}=y_n-\hat{\beta_1}x_n-\hat{\beta_0}$.
So, is my approach correct?
Or can you guys see a major flaw?
Let me know
Best Answer
Your approach is correct.
By differentiating with respect to $\beta_2$, we can see that at the optimal value, we must have
$$\hat{\beta}_2 = y_n -\hat{\beta_1}x_n-\hat{\beta_0}$$
That is the last term of the objective function must vanish.
Hence the problem to solve for $\hat{\beta_0}$ and $\hat{\beta_1}$ is the same as minimizing
$$\sum_{i=1}^{n-1} (y_i-\beta_0-\beta_1 x_i)^2$$
Hence, we know that $\hat{\beta_1}=\hat{\alpha_1}$ and furthermore, $\hat{\beta_0}=\hat{\alpha_0}$.