Projection matrix formula intuition

linear algebralinear-transformationsmatricesprojectionprojection-matrices

I completely understand how projection matrix formula: $$P = A(A^TA)^{-1}A^T$$is derived from: $$ A^T(b – A\hat{x} ) = 0$$ but what I don't understand is the "story proof" or the "intuition" of the first formula as a linear transformation to the column space of $A$, as it is supposed to be.

In fact I have three specific questions:

  • why would someone transform $b$ (as a vector to be projected onto $A$) into the "row space" of $A$ before it can be transformed into the column space of $A$?
  • what specific transformation does the matrix $A^TA$ encode?
  • why one should transform the vector to the "inverse of the tranformation $A^TA$" before it can be transformed to the column space of $A$?

Best Answer

First note that the column space $R(A)$ is being mapped by $P$ identically to itself. Indeed, for a vector $x$ in the domain, we have $\require{extpfeil}\Newextarrow{\xmapsto}{5,5}{0x27FC}$

$$Ax \,\xmapsto{A^T} A^TAx \,\xmapsto{(A^TA)^{-1}} x \,\xmapsto{A} Ax.$$

On the other hand, for a vector $y \in R(A)^\perp$ recall that $R(A)^\perp = N(A^T)$ so $$y\,\xmapsto{A^T} 0 \,\xmapsto{A(A^TA)^{-1}} 0.$$ Finally, the domain can be decomposed as $R(A) \oplus R(A)^\perp$ so with respect to this decomposition we have $P = I \oplus 0$, which means precisely that $P$ is an orthogonal projection onto $R(A)$.