[Math] The meaning behind $(X^TX)^{-1}$

linear algebraregressionstatistics

In linear algebra, we learn that the inverse of a matrix "undoes" the linear transformation. What exactly is the meaning of the inverse of $(X^TX)^{-1}$?

$X^TX$ we know as being a square matrix whose diagonal elements are the sums of squares. So what are we doing when we take the inverse of this? I have always used this property in my calculations but would like to understand more of the meaning behind it.

Best Answer

When $X$ is a real matrix, the elements of $(X^TX)^{-1}$ also provide a measure of the extent of linear dependence among the columns of $X$.

If $X^TX$ is invertible then the columns of $X$ have to be independent, but sometimes the the columns are "almost" dependent in a sense which will be made clear below.

Denote the $i$th column of $X$ by $x_i$ and let let $\hat{x_i}$ denote the projection of $x_i$ on space spanned by $\{x_j : j \neq i \}$. Call $\epsilon_i = x_i - \hat{x_i}.$ Not that if any $\|\epsilon_i\|$ is "small", it suggests strong linear dependence among the columns of $X$

One can prove the $ij$th element of $(X^TX)^{-1}$ is $\dfrac{\epsilon_i^T\epsilon_j}{\|\epsilon_i\|^2\|\epsilon_j\|^2}.$

In particular the ith diagonal element of $(X^TX)^{-1}$ is $\dfrac{1}{\|\epsilon_i\|^2}$. So if the $i$th column of $X$ is almost a linear combination of other columns, it will be indicated by a very large value at the $i$th diagonal element of $(X^TX)^{-1}$.