In my work, I have repeatedly stumbled across the matrix (with a generic matrix $X$ of dimensions $m\times n$ with $m>n$ given) $\Lambda=X(X^tX)^{-1}X^{t}$. It can be characterized by the following:
(1) If $v$ is in the span of the column vectors of $X$, then $\Lambda v=v$.
(2) If $v$ is orthogonal to the span of the column vectors of $X$, then $\Lambda v = 0$.
(we assume that $X$ has full rank).
I find this matrix neat, but for my work (in statistics) I need more intuition behind it. What does it mean in a probability context? We are deriving properties of linear regressions, where each row in $X$ is an observation.
Is this matrix known, and if so in what context (statistics would be optimal but if it is a celebrated operation in differential geometry, I'd be curious to hear as well)?
Best Answer
It is also called hat matrix. The idea is that this matrix "gives the hat": transforms the dependent variable to its prediction in linear regression.
The linear regression model is the following:
$$y=X\beta+\varepsilon.$$
The least squares estimate of the $\beta$ is defined as
$$\hat\beta=(X^TX)^{-1}X^Ty.$$
The prediction of the model is then:
$$\hat{y}=X\hat\beta=X(X^TX)^{-1}X^Ty$$
So we get that matrix $X(X^TX)^{-1}X^T$ transforms $y$ to $\hat{y}$, hence the hat matrix.