We're learning about multi regression in the current module of my statistics course, and the instructor noted that the sum of square errors (SSE) of a full model such as the one below:
$Y_i=\beta_0+\beta_1x_{1i}+\beta_2x_{2i}+\beta_3x_{3i}+\epsilon_i$
is going to be smaller than the SSE for any reduced model, such as the one below (which we obtain under the assumption that $\beta_1=0$):
$Y_i=\beta_0+\beta_2x_{2i}+\beta_3x_{3i}+\epsilon_i$
I'm having trouble understanding why this is true. If SSE is defined as:
$\sum^{n}_{i=1}(y_i-\hat{y_i})$
Shouldn't the full model's SSE be bigger because it has more terms?
Best Answer
If $\beta_1$ is exactly zero, the SSE for full and reduced models will be identical. To the extent $\beta_1$ is not exactly zero, the component of the variance (sums of squares) of Y attributable to $x_1$ is added to the SSE, all else being equal, as it is no longer represented in the model anywhere but e (the residual variance).
So the SSE for a reduced model can never be smaller than the SSE for a full model, because it's the SSE for the full model plus any SS attributable to the constraints to the extent they are something other than exactly true.