Regression – Understanding Linear Projection in ‘The Elements of Statistical Learning’

machine learningregression

In the book "The Elements of Statistical Learning" in chapter 2 ("Linear models and least squares; page no: 12"), it is written that

In the (p+1)-dimensional input-output space, (X,Y) represent a hyperplane. If the constant is included in X, then the hyperplane includes the origin and is a subspace; if not, it is an affine set cutting the Y-axis at the point (0,$\beta$).

I don't get the sentence "if constant is … (0,$\beta$)". Please help? I think the hyperplane would cut the Y-axis at (0,$\beta$)in both the cases, is that correct?

The answer below has helped somewhat, but I am looking for more specific answer. I understand that when $1$ is included in the $X$, it won't contain origin, but then how would the $(X,Y)$ would contain origin? Should not it depend on value of $\beta$? If intercept $\beta_0$ is not $0$, $(X,Y)$ should not contain origin, in my understanding?

Best Answer

Including the constant 1 in the input vector is a common trick to include a bias (think about Y-intercept) but keeping all the terms of the expression symmetrical: you can write $\beta X$ instead of $\beta_0 + \beta X$ everywhere.

If you do this, it is then correct that the hyperplane $Y = \beta X$ includes the origin, since the origin is a vector of $0$ values and multiplying it for $\beta$ gives the value $0$.

However, your input vectors will always have the first element equal to $1$; therefore they will never contain the origin, and will be place on an smaller hyperplane, which has one less dimension.

You can visualize this by thinking of a line $Y=mx+q$ on your sheet of paper (2 dimensions). The corresponding hyperplane if you include the bias $q$ your vector becomes $X = [x, x_0=1]$ and your coefficients $\beta = [m, q]$. In 3 dimensions this is a plane passing from the origin, that intercepts the plane $x_0=1$ producing the line where your inputs can be placed.