Regression – How Adding Regressors Changes the Estimated Variance of Error Terms

errormultiple regressionregressionself-study

Consider the regression model $$Y=X\beta+\epsilon$$ where the error terms in $\epsilon=(\epsilon_1,\ldots,\epsilon_n)^T$ are homoskedastic and where $n$ is the number of observations. Assume that we have an intercept and $k$ regressors; i.e., $\beta=(\beta_0,\ldots,\beta_k)$. Furthermore, denote the least squares estimate of $\beta$ by $\hat{\beta}$ and let $\hat{\epsilon}=Y-X\hat{\beta}$ be the residual.

Now, under the Guass-Markov assumptions $$\hat{\sigma}^2=\hat{\epsilon}^T\hat{\epsilon}/(n-k-1)\tag{1}$$ is an unbiased estimator of the variance $\sigma^2=E(\epsilon^2_i)$ of any error term $i$.

According to me the following holds: When I add regressors to my model, $k$ increases and consequently $1/(n-k-1)$ increases, while $\hat{\epsilon}^T\hat{\epsilon}$ decreases or stays constant (since when adding regressors we minimize the residual sum of squares over a larger domain). Hence, the effect of adding regressors on the estimated variance of the error term (i.e., $\hat{\sigma}^2$) is ambiguous.

However, I keep on hearing (e.g., by my professor in econometrics, in this lecture on multiple regression at page 23 and in two answers to a question here at CrossValidated) that the estimated variance of the error term decreases or stays constant when I increase the number of regressors. I do not agree $-$ am I missing something?

Best Answer

Here's a little R simulation that confirms your intuition:

set.seed(9955)

N <- 10^3
p <- 100
X <- matrix(rnorm(N*p, mean=0, sd=1), N, p)
colnames(X) <- sprintf("x_%s", seq_len(ncol(X)))
y <- rnorm(N, mean=0, sd=1)  # True betas are zero
df <- as.data.frame(cbind(y, X))
names(df) <- gsub("V", "x_", names(df))

estimated_sigma_squared <- sapply(seq_len(p), function(k) {
    message("estimating model with constant and ", k, " Xs")
    model_formula <- reformulate(response="y", termlabels=sprintf("x_%s", seq_len(k)))
    model <- lm(formula=model_formula, data=df)
    sigma_squared_hat <- sum(residuals(model)^2) / (N - k - 1)
    return(sigma_squared_hat)
})

plot(estimated_sigma_squared, type="l")
any(diff(estimated_sigma_squared) > 0)  # True

A ggplot2 version of the plot:

sigma squared hat as function of number of regressors

Note that the maximum likelihood estimate of $\sigma^2$ has $N$ in the denominator (as opposed to $N-k-1$), and therefore cannot increase with $k$.