In linear regression, what’s the asymptotic distribution of the error variance estimator

least squaresregressionresidualsvariance

Suppose $Y_i=X_i'\beta+\epsilon_i$ with $E(\epsilon_i|X_i)=0$ and $E\epsilon^2_i=\sigma^2$ and I estimate $\sigma^2$ using $s^2=\frac{1}{n}\sum_{i=1}^n (Y_i-X_i'\widehat{\beta})^2$, where $\widehat{\beta}$ is the OLS estimator of $\beta$. I'm wondering what's the asymptotic distribution of (a properly scaled and normalized version of) $s^2$? It would be great if you could provide the detailed steps of the derivation. Thanks in advance!

Best Answer

This answer extends the sample mean case studied in Asymptotic distribution of $\sqrt{n}(\hat{\sigma}_{1}^{2}-\sigma^2)$ to the linear regression setup.

Let us assume an iid sample from a linear model $y_i=x_i'\beta+\epsilon_i$ with first four conditional error moments $\{E(\epsilon_i^p|x_i)\}_{p=1}^4=(0,\sigma^2,\mu_3,\mu_4)'$. Note $$\varsigma^2:=Var(\epsilon_i^2|x_i) = E(\epsilon_i^4|x_i)-[E(\epsilon_i^2|x_i)]^2=\mu_4-\sigma^4$$ and assume $0<\varsigma^2 <\infty$. That is, we require higher-moment assumptions not required for, e.g., "mere" consistency of the OLS estimator. In addition to assuming homoskedasticity, we also assume what one might call "homokurtosis", i.e., $Var(\epsilon_i^2|x_i)$ does not depend on $x_i$.

We may then add and subtract to get \begin{align*} s^2 & = \frac{1}{n}\sum_{i=1}^n (y_i-x_i'\hat\beta)^2\\ & = \frac{1}{n}\sum_{i=1}^n ((y_i- x_i'\beta) + (x_i'\hat\beta-x_i'\beta))^2\\ & = \frac{1}{n}\sum_{i=1}^n (\epsilon_i + (\hat\beta-\beta)'x_i))^2\\ & = \frac{1}{n}\sum_{i=1}^n \epsilon_i^2 + 2 (\hat\beta-\beta)'\frac{1}{n} \sum_{i=1}^nx_i\epsilon_i + \frac{1}{n}\sum_{i=1}^n[(\hat\beta-\beta)'x_i]^2\\ & = \underbrace{\frac{1}{n}\sum_{i=1}^n \epsilon_i^2}_{A} + \underbrace{2 (\hat\beta-\beta)'\frac{1}{n} \sum_{i=1}^nx_i\epsilon_i}_B+ \underbrace{(\hat\beta-\beta)'\frac{1}{n}\sum_{i=1}^nx_ix_i'(\hat\beta-\beta)}_C \end{align*}

$B$ and $C$ are asymptotically negligible under suitable assumptions: The OLS estimator converges at rate $\sqrt{n}$, i.e., $\hat\beta-\beta=O_p(n^{-1/2})$. Under predeterminedness (implied by the exogeneity condition you state), a WLLN gives $$ \frac{1}{n} \sum_{i=1}^nx_i\epsilon_i\to_pE(x_i\epsilon_i)=0, $$ i.e., $$\frac{1}{n} \sum_{i=1}^nx_i\epsilon_i=o_p(1)$$ Thus, \begin{align*} B & = O_p(n^{-1/2})o_p(1)=o_p(n^{-1/2}) \end{align*} Similarly, with existing second moments of the regressors we have $$ \frac{1}{n}\sum_{i=1}^nx_ix_i'\to_pE(x_ix_i')<\infty, $$ so that $$ \frac{1}{n}\sum_{i=1}^nx_ix_i'=O_p(1), $$ Then, $$ C=O_p(n^{-1/2})O_p(1)O_p(n^{-1/2})=O_p(n^{-1})=o_p(n^{-1/2}) $$

Hence, we may focus on the key term $A$ now. By assumption we have $0<\varsigma^2 <\infty$ and hence, by an application of the central limit theorem with $E(\epsilon_i^2)=\sigma^2$ we obtain that $$\sqrt{n}(s^2-\sigma^2)=\sqrt{n}\left(\frac{1}{n}\sum_{i=1}^n \epsilon_i^2+o_p(n^{-1/2})-\sigma^2\right) \stackrel{d}{\to}\mathcal{N}(0,\varsigma^2).$$

Writing $$ \varsigma^2=\mu_4-\sigma^4, $$ highlights the dependence of the asymptotic distribution on higher moments. E.g., for normally distributed errors, $\mu_4=3\sigma^4$, so that $\varsigma^2=2\sigma^4$.

Two illustrations based on normal and Laplace errors, depending on what you comment in:

n <- 2000

errvar <- function(n){
  # y <- rnorm(n) # i.e. an error variance of sigma^2=1
  y <- VGAM::rlaplace(n) # i.e. error variance=2
  x <- rnorm(n)
  limo <- summary(lm(y~x))
  sighatsq <- limo$sigma^2 # here, we still have the (asymptotically negligible) d.o.f. correction
  # sqrt(n)*(sighatsq - 1) # normal case
  sqrt(n)*(sighatsq - 2)

MASS::truehist(replicate(50000, errvar(n)), col="lightblue", nbins=50)
xax <- seq(-15, 15, by=.01)
# lines(xax, dnorm(xax, sd=sqrt(2)), lwd=2, col="darkgreen") # note varepsilon=2, normal case
lines(xax, dnorm(xax, sd=sqrt(20)), lwd=2, col="darkgreen") # note 4th moment 24,, i.e. varepsilon=24-2^2=20

The picture is for the Laplace case:

enter image description here