Regression – Intuitive Explanation of the $(X^TX)^{-1}$ Term in the Variance of Least Square Estimator

intuitionleast squaresregressionvariance

If $X$ is full rank, the inverse of $X^TX$ exists and we
get the least squares estimate: $$\hat\beta = (X^TX)^{-1}XY$$
and $$\operatorname{Var}(\hat\beta) = \sigma^2(X^TX)^{-1}$$

How can we intuitively explain $(X^TX)^{-1}$ in the variance formula? The technique of derivation is clear to me.

Best Answer

Consider a simple regression without a constant term, and where the single regressor is centered on its sample mean. Then $X'X$ is ($n$ times) its sample variance, and $(X'X)^{-1}$ its recirpocal. So the higher the variance = variability in the regressor, the lower the variance of the coefficient estimator: the more variability we have in the explanatory variable, the more accurately we can estimate the unknown coefficient.

Why? Because the more varying a regressor is, the more information it contains. When regressors are many, this generalizes to the inverse of their variance-covariance matrix, which takes also into account the co-variability of the regressors. In the extreme case where $X'X$ is diagonal, then the precision for each estimated coefficient depends only on the variance/variability of the associated regressor (given the variance of the error term).