# In linear regression, what’s the asymptotic distribution of the error variance estimator

least squaresregressionresidualsvariance

Suppose $$Y_i=X_i'\beta+\epsilon_i$$ with $$E(\epsilon_i|X_i)=0$$ and $$E\epsilon^2_i=\sigma^2$$ and I estimate $$\sigma^2$$ using $$s^2=\frac{1}{n}\sum_{i=1}^n (Y_i-X_i'\widehat{\beta})^2$$, where $$\widehat{\beta}$$ is the OLS estimator of $$\beta$$. I'm wondering what's the asymptotic distribution of (a properly scaled and normalized version of) $$s^2$$? It would be great if you could provide the detailed steps of the derivation. Thanks in advance!

This answer extends the sample mean case studied in Asymptotic distribution of $\sqrt{n}(\hat{\sigma}_{1}^{2}-\sigma^2)$ to the linear regression setup.

Let us assume an iid sample from a linear model $$y_i=x_i'\beta+\epsilon_i$$ with first four conditional error moments $$\{E(\epsilon_i^p|x_i)\}_{p=1}^4=(0,\sigma^2,\mu_3,\mu_4)'$$. Note $$\varsigma^2:=Var(\epsilon_i^2|x_i) = E(\epsilon_i^4|x_i)-[E(\epsilon_i^2|x_i)]^2=\mu_4-\sigma^4$$ and assume $$0<\varsigma^2 <\infty$$. That is, we require higher-moment assumptions not required for, e.g., "mere" consistency of the OLS estimator. In addition to assuming homoskedasticity, we also assume what one might call "homokurtosis", i.e., $$Var(\epsilon_i^2|x_i)$$ does not depend on $$x_i$$.

We may then add and subtract to get \begin{align*} s^2 & = \frac{1}{n}\sum_{i=1}^n (y_i-x_i'\hat\beta)^2\\ & = \frac{1}{n}\sum_{i=1}^n ((y_i- x_i'\beta) + (x_i'\hat\beta-x_i'\beta))^2\\ & = \frac{1}{n}\sum_{i=1}^n (\epsilon_i + (\hat\beta-\beta)'x_i))^2\\ & = \frac{1}{n}\sum_{i=1}^n \epsilon_i^2 + 2 (\hat\beta-\beta)'\frac{1}{n} \sum_{i=1}^nx_i\epsilon_i + \frac{1}{n}\sum_{i=1}^n[(\hat\beta-\beta)'x_i]^2\\ & = \underbrace{\frac{1}{n}\sum_{i=1}^n \epsilon_i^2}_{A} + \underbrace{2 (\hat\beta-\beta)'\frac{1}{n} \sum_{i=1}^nx_i\epsilon_i}_B+ \underbrace{(\hat\beta-\beta)'\frac{1}{n}\sum_{i=1}^nx_ix_i'(\hat\beta-\beta)}_C \end{align*}

$$B$$ and $$C$$ are asymptotically negligible under suitable assumptions: The OLS estimator converges at rate $$\sqrt{n}$$, i.e., $$\hat\beta-\beta=O_p(n^{-1/2})$$. Under predeterminedness (implied by the exogeneity condition you state), a WLLN gives $$\frac{1}{n} \sum_{i=1}^nx_i\epsilon_i\to_pE(x_i\epsilon_i)=0,$$ i.e., $$\frac{1}{n} \sum_{i=1}^nx_i\epsilon_i=o_p(1)$$ Thus, \begin{align*} B & = O_p(n^{-1/2})o_p(1)=o_p(n^{-1/2}) \end{align*} Similarly, with existing second moments of the regressors we have $$\frac{1}{n}\sum_{i=1}^nx_ix_i'\to_pE(x_ix_i')<\infty,$$ so that $$\frac{1}{n}\sum_{i=1}^nx_ix_i'=O_p(1),$$ Then, $$C=O_p(n^{-1/2})O_p(1)O_p(n^{-1/2})=O_p(n^{-1})=o_p(n^{-1/2})$$

Hence, we may focus on the key term $$A$$ now. By assumption we have $$0<\varsigma^2 <\infty$$ and hence, by an application of the central limit theorem with $$E(\epsilon_i^2)=\sigma^2$$ we obtain that $$\sqrt{n}(s^2-\sigma^2)=\sqrt{n}\left(\frac{1}{n}\sum_{i=1}^n \epsilon_i^2+o_p(n^{-1/2})-\sigma^2\right) \stackrel{d}{\to}\mathcal{N}(0,\varsigma^2).$$

Writing $$\varsigma^2=\mu_4-\sigma^4,$$ highlights the dependence of the asymptotic distribution on higher moments. E.g., for normally distributed errors, $$\mu_4=3\sigma^4$$, so that $$\varsigma^2=2\sigma^4$$.

Two illustrations based on normal and Laplace errors, depending on what you comment in:

n <- 2000

errvar <- function(n){
# y <- rnorm(n) # i.e. an error variance of sigma^2=1
y <- VGAM::rlaplace(n) # i.e. error variance=2
x <- rnorm(n)
limo <- summary(lm(y~x))
sighatsq <- limo\$sigma^2 # here, we still have the (asymptotically negligible) d.o.f. correction
# sqrt(n)*(sighatsq - 1) # normal case
sqrt(n)*(sighatsq - 2)
}

MASS::truehist(replicate(50000, errvar(n)), col="lightblue", nbins=50)
xax <- seq(-15, 15, by=.01)
# lines(xax, dnorm(xax, sd=sqrt(2)), lwd=2, col="darkgreen") # note varepsilon=2, normal case
lines(xax, dnorm(xax, sd=sqrt(20)), lwd=2, col="darkgreen") # note 4th moment 24, https://stats.stackexchange.com/questions/10982/moments-of-laplace-distribution, i.e. varepsilon=24-2^2=20


The picture is for the Laplace case: