Intuitive proof of the least squares formula

least squareslinear algebra

Solving the problem: $A\vec{x}=\vec{b}$. I'm considering $\mathbb{R}^3$ and the span of $A$ being two dimensional, and $\vec{b}$ not being in the span of $A$, but I think my intuition holds for any vector space.

Minimising the length of the error vector is a big name for finding the closest image of $A$ from $\vec{b}$, which looks geometrically like projecting $\vec{b}$ and the plane of the span of $A$, and then solving $A\vec{x}=\vec{b}_{proj}$.

What I am looking for is a proof that the formula $(A^T A)^{-1} A^T\vec{b}$ works, which would highlight those geometrical intuitions. What I was hoping for was to find one part on the right of the solution representing the the projection, and the other right part representing the "inverse" of $A$, but I failed to find any link.

Best Answer

I assume that $A$ has full rank and more rows than columns. Otherwise the notation $(A^T A)^{-1}$ would not make sense. In the OP's example, this means $A\in\mathbb{R}^{3\times 2}$ and $\vec{x}\in\mathbb{R}^2$.

$A\vec{x}=\vec{b}_{\mathrm{proj}}$ is another way of saying that the line through $\vec{b}$ and $A\vec{x}$ is perpendicular to the span of $A$. Therefore, $$ \langle \vec{v} , A\vec{x}-\vec{b} \rangle = 0 \;\;\forall\;\vec{v}\in\mathrm{span}(A) $$ or, even simpler, $$ A^T (A\vec{x} - \vec{b}) = 0 $$ which immediately leads us to $$ \vec{x} = (A^T A)^{-1}A^T\vec{b} $$

Related Question