Regression – How Adding Regressors Changes the Estimated Variance of Error Terms

Consider the regression model $$Y=X\beta+\epsilon$$ where the error terms in $\epsilon=(\epsilon_1,\ldots,\epsilon_n)^T$ are homoskedastic and where $n$ is the number of observations. Assume that we have an intercept and $k$ regressors; i.e., $\beta=(\beta_0,\ldots,\beta_k)$. Furthermore, denote the least squares estimate of $\beta$ by $\hat{\beta}$ and let $\hat{\epsilon}=Y-X\hat{\beta}$ be the residual.

Now, under the Guass-Markov assumptions $$\hat{\sigma}^2=\hat{\epsilon}^T\hat{\epsilon}/(n-k-1)\tag{1}$$ is an unbiased estimator of the variance $\sigma^2=E(\epsilon^2_i)$ of any error term $i$.

According to me the following holds: When I add regressors to my model, $k$ increases and consequently $1/(n-k-1)$ increases, while $\hat{\epsilon}^T\hat{\epsilon}$ decreases or stays constant (since when adding regressors we minimize the residual sum of squares over a larger domain). Hence, the effect of adding regressors on the estimated variance of the error term (i.e., $\hat{\sigma}^2$) is ambiguous.

However, I keep on hearing (e.g., by my professor in econometrics, in this lecture on multiple regression at page 23 and in two answers to a question here at CrossValidated) that the estimated variance of the error term decreases or stays constant when I increase the number of regressors. I do not agree $-$ am I missing something?

set.seed(9955) N <- 10^3 p <- 100 X <- matrix(rnorm(N*p, mean=0, sd=1), N, p) colnames(X) <- sprintf("x_%s", seq_len(ncol(X))) y <- rnorm(N, mean=0, sd=1) # True betas are zero df <- as.data.frame(cbind(y, X)) names(df) <- gsub("V", "x_", names(df)) estimated_sigma_squared <- sapply(seq_len(p), function(k) { message("estimating model with constant and ", k, " Xs") model_formula <- reformulate(response="y", termlabels=sprintf("x_%s", seq_len(k))) model <- lm(formula=model_formula, data=df) sigma_squared_hat <- sum(residuals(model)^2) / (N - k - 1) return(sigma_squared_hat) }) plot(estimated_sigma_squared, type="l") any(diff(estimated_sigma_squared) > 0) # True

Regression – How Adding Regressors Changes the Estimated Variance of Error Terms

Best Answer

Related Question

Best Answer

Related Solutions

Regression – What is Estimated Unbiased Variance of the Error Term

Solved – Weighted least-squares negative fitted values

Related Question