[Math] Checking the understanding of projection onto subspace vs least square approximation

least squaresprojection-matrices

I have trouble understanding the topic of projection vs. least square approximation in an Introductory Linear Algebra class. I know this question has already been asked (Difference between orthogonal projection and least squares solution), but I want to check my understanding.

PROJECTION ONTO SUBSPACE

In projection, the purpose is to find the point where the projection occurs onto a subspace. Subspace here must pass through the center of the origin. For simplicity, assume that we are talking about the case of $\mathbb{R}^2$. Assume that the subspace here is a line vector $a$, characterized by $2$ x $1$ in size. A point $b$ is projecting onto the subspace $a$ at $p$. Then the point $p$ can be calculated as follow:

$$
p = ax = \frac{aa^T}{a^Ta}b
$$

(This is the part that I am very unsure of: What does $x$ actually represent?) In this equation, $x$ represents the multiplication factor of $a$. It tells you how much $a$ is needed in order to find the exact location of $p$.

The meaning of $x$ here can easily be generalized in $\mathbb{R}^n$. $A$ is a subspace where the point $b$ is projected onto. In $\mathbb{R}^n$, $x$ is a vector representing how much each of the columns in $A$ is needed in order to find the point $p$.

When there are multiple $b$'s, it is often useful to calculate the projection matrix $P$ for convenience.

$P = \frac{aa^T}{a^Ta}$

because $p = Pb$.

LEAST SQUARE APPROXIMATION

In $\mathbb{R}^2$, least square approximation is for finding a (regression) line that best fits multiple datapoints.

In the previous case of projection on subspace, the vector $a$ is a known subspace which $b$ is projected onto. However, in least square approximation, the regression line (which datapoints are projected onto) is unknown parameters instead. The regression line is also not necessarily a subspace (it usually does not pass through origin).

The role of $x$ is also different between projection onto subspace and least square approximation. In projection onto subspace, $x$ is a multiplication factor for the subspace $a$. However, in least square approximation, $x$ is the intercept and the slope of the regression line, which the datapoints are projected onto.

Projection matrix $P$ is useful in the previous topic (projection onto subspace) but not very useful in the current topic (least square approximation).

Even in $\mathbb{R}^2$, $A$ is a $m$ x 2 matrix, rather than a simple subspace vector in the previous topic. The goal of least square is to calculate $x$ (regression line):

$$
x = (A^TA)^{-1}A^Tb
$$

b represents the the value of datapoints on the $y$-axis. The first column of $A$ represents intercept (usually with the value of $1$), while the second column of $A$ represents the value of datapoints on the $x$-axis.

I know that I am poor in terminology in the above discussion, but I would appreciate if anyone can point out any serious mistakes in my understanding. One thing I found extremely confusing is that the same method is used for both topics, while $a$ (or $A$), $b$, and $x$ mean totally different things in the two topics.

Best Answer

Notation is a culprit here. Regression line fitting is in fact a projection onto a linear subspace but perhaps not the one you would first think of. Even if you fit to a cubic polynomial you will be dealing with a projection onto a linear subspace! Let us stick to regression lines and let $Y=\left( \begin{matrix} y_1 & y_2& \cdots& y_n \end{matrix} \right)^T$, $M=\left( \begin{matrix} x_1 & x_2 & \cdots & x_n\\ 1 & 1 & \cdots & 1\end{matrix}\right)^T$ and $u=\left( \begin{matrix} a \\ b \end{matrix}\right)$. We want to estimate $u$ (i.e. $a$ and $b$) so as to minimize the sum $\sum_i (y_i - ax_i - b)^2$ or equivalently the square distance $\|Y-Mu\|^2=\langle Y-Mu,Y-Mu\rangle$ in ${\mathbb R}^n$ (note that this distance and scalar product is in $n$ dimensional space and not in two dimensions). A minimum is attained when $M^T M u = M^T Y$ which leads to a unique solution whenever $M^T M$ is invertible. You may in fact do a least square fit to any linear combination of (non-linear) functions, say $ae^x+b\sin(x)$. It is linearity w.r.t. $a$ and $b$ (the fitting constants), not $x$ (the variable), which is important.

Related Question