Why does having $X_0 = 1$ mean that the hyperplane includes the origin

affine-geometrygeometrylinear algebra

I was just reading this question on stats.stackexchange, because I had the same question about why having $X_0 = 1$ means that the hyperplane includes the origin, and why it is an affine set cutting the $Y$-axis at the point $(0, \hat{\beta}_0$) if the constant is not included in $X$. However, I don't think the answer actually explains this; rather, it seems like it just restates it in a verbose way. And judging by the mathematics involved, I think that it would be a more appropriate question for the minds at math.stackexchange. So I am looking for a clear explanation of why this is the case; that is, why does having $X_0 = 1$ mean that the hyperplane includes the origin, and why it is an affine set cutting the $Y$-axis at the point $(0, \hat{\beta}_0$) if the constant is not included in $X$?

The textbook section is 2.3.1 Linear Models and Least Squares from here. The relevant parts are all at the beginning of section 2.3.1.


EDIT:

The part that I'm interested in is

Often it is convenient to include the constant variable $1$ in $X$, …

and

In the $(p + 1)$-dimensional input-output space, $(x, \hat{Y})$ represents a hyperplane. If the constant is included in $X$, then the hyperplane includes the origin and is a subspace; if not, it is an affine set cutting the $Y$-axis at the point $(0, \hat{\beta}_0)$.

Best Answer

The equation $y=\beta_0+\beta_1 x_1+\cdots\beta_p x_p$, where the unknowns are $(x_1,\dots,x_p,y)$, describes an affine hyperplane $H$ of the affine space $\Bbb R^{p+1}$, that does not pass through the origin if $\beta_0\ne 0$.

The equation $y=\beta_0x_0+\beta_1 x_1+\cdots\beta_p x_p$, where the unknowns are now $(x_0,x_1,\dots,x_p,y)$, describes an affine hyperplane of the affine space $\Bbb R^{p+2}$ that does pass through the origin (hence it's a subspace of dimension $p+1$ of $\Bbb R^{p+2}$ considered as a vector space).

However, to describe the same model, you have the additional constraint $x_0=1$, and these two equations together describe an affine subset $S$ of $\Bbb R^{p+2}$ that has $p$ dimensions. It's simply an embedding in $\Bbb R^{p+2}$ of the hyperplane $H$ defined above. It's not a subspace of $\Bbb R^{p+2}$ because for all $x\in S$, $x_0=1$. And it's not an hyperplane either because it has dimension $p$, not $p+1$.

Therefore, I regard the sentence "If the constant is included in $X$, then the hyperplane includes the origin and is a subspace; if not, it is an affine set cutting the $Y$-axis at the point $(0,\hat\beta_0)$". as wrong.

That being said, I think it's a minor error that does not impair the subsequent exposition of the linear model. I have another concern about the randomness in the model being completelly hidden, but it's the introduction of the chapter, and later on the epsilons are introduced as expected, to address inference.

Related Question