Solved – Why variance of OLS estimate decreases as sample size increases

least squaresmultiple regressionregressionvariance

Let $X$ be a $n\times (p+1)$ non-stochastic design matrix. OLS estimator is given by

$$\hat{\beta} = (X'X)^{-1}X' y$$

Thus the variance of the estimator is

$$\text{Var}\left( \hat{\beta}\right) = (X'X)^{-1} \sigma^2\, , $$

where $\text{Var}(y) = I_n \sigma^2$.

My question is, why is it true that variance of estimator decreases as sample size increases? It is not obvious to me what the $i$-th diagonal entry of $(X'X)^{-1}$ is.

Best Answer

If we assume that $\sigma^2$ is known, the variance of the OLS estimator only depends on $X'X$ because we do not need to estimate $\sigma^2$. Here is a purely algebraic proof that the variance of the estimator decreases with any additional observation if $\sigma^2$ is known. Suppose $X$ is your current design matrix and you add one more observation $x$, which has dimension $1\times (p+1)$. Your new design matrix is $$X_{new} = \left(\begin{array}{c}X \\ x \end{array}\right).$$ You can check that $X_{new}'X_{new} = X'X + x'x$. Using the Woodbury identity we get $$ (X_{new}'X_{new})^{-1} = (X'X + x'x)^{-1} = (X'X)^{-1} - \frac{(X'X)^{-1}x'x(X'X)^{-1}}{1+x(X'X)^{-1}x'} $$ Because $(X'X)^{-1}x'x(X'X)^{-1}$ is positive semi-definite (it is the multiplication of a matrix with its transpose) and $1+x(X'X)^{-1}x'>0$, the diagonal elements of the subtracting term are greater than or equal to zero. So, the diagonal elements of $(X_{new}'X_{new})^{-1}$ are less than or equal to the diagonal elements of $(X'X)^{-1}$.