Degree of freedom (df) in maximum likelihood estimate (MLE) for linear regression

degrees of freedomhypothesis testingmathematical-statisticsmaximum likelihoodregression

I am reading Bain/Engelhardt's "Introduction to Probability and Math Statistics" about maximum likelihood estimate (MLE) for linear regression (p. 519 – 522).

I first summarize 3 key points from the textbook that I am interested in.

(1) The MLE for $ \hat{\beta} = (X^TX)^{-1}X^TY $ and $ \hat{\sigma}^2 = \frac{(Y-X\hat{\beta})^T(Y-X\hat{\beta})}{n} $. For this part, I totally get it.

(2) Further, the book mentions the following: $\tilde{\sigma} = \frac{(Y-X\hat{\beta})^T(Y-X\hat{\beta})}{n-p-1} $ is the UMVUE of $\sigma^2$. Okay, for this, I understand that it is unbiased estimate. So no problem.

(3) Furhter, it says the following: $T=\frac{\hat{\beta_j}-\beta_j}{\sqrt{\tilde{\sigma}^2 a_{jj}}} \sim t(n-p-1)$ where $ \{a_{jj} \}$ is the number on the diagonal of $(X^TX)^{-1}$ corresponding to $\beta_j$. Basically, this can be used to do hypothesis test. For $H_0: \beta_j = \beta_{j0}$, we reject it if $|t|> t_{1-\alpha/2} (n-p-1)$.

If I understand correctly, typically we assume $\beta_{j0} =0$, and thus we can calcuate t-statistic as $t(n-p-1)=\frac{\hat{\beta_j}}{\sqrt{\tilde{\sigma}^2 a_{jj}}}$.

OK, my two questions as follows.

(1) If we use MLE to estimate regression coefficients for linear regressions, it seeems we can use t-test to evaluate the significance level, right? It seems yes based on this textbook. If so, what is the degree of freedom for this t-test, it seems it should be n-p-1 (p not including intercept). Thus, for simple linear regression, it will be n-1-1=n-2. Correct?

(2) We know that for MLE, it typically also estimates the the variance for the noise term, $ \hat{\sigma}^2$. I know it counts as one parameter. Does it consider use one more degree of freedom? If so, why no need to use n-3 in the t-test for regression coefficcient t-test? Is it because $\beta $ and $ \sigma^2 $ are independent, and thus the estimate of $\sigma^2$ does not impact the df of t-test for regression coefficients.

Thank you so much! Look forward to any feedback and helps.

Added Content:

Even though I added it at the comment section in Rachel's answer, I think it is better to put it here as here allows code colors.

In particular, there is another post, link below, about the degree of freedom in MLE for linear regression.

What does the degree of freedom (df) mean in the results of log-likelihood `logLik`

> m <- lm(mpg ~ hp, data = mtcars) 

logLik(m)
'log Lik.' -87.61931 (df=3)

Regarding the R code output shown above, the following is my additional question, namely question (3):

(3) I understand the t-test's df for regression coefficients is n-p-1. Thus, for simple linear regression, it will be n-2. If so, why does logLik in R return df=3? Is it because 3 here means 3 parameters, and not necessarily 3 df per se? Thank you.

To answer the (3) question by myself, based on the discussion with Rachel and others (see all the comments under this main question and under Rachel's answer):

logLik(m) returns 3 df means that it estimates 3 parameters(intercept, slope and variance). However, since the estimation of $\sigma^2$ is based on a formula with estimated intercept and slope, it does not cost 1 more df. Thus, the actual df in t-test for regression coefficients is still n-2. (2 represents one df for intercept and one df for slope, in the context of simple linear regression.)

Best Answer

  1. No. The significance level is always chosen by the practitioner in advance of conducting the hypothesis test. It is the chosen probability of making a Type I error. But yes, you have specified the df correctly.

  2. When $\sigma^2$ is known, $T\sim N(0,1)$. Otherwise, letting $a_j^2$ be the $j^{th}$ diagonal entry of $(X'X)^{-1}$, $$\begin{eqnarray*} T &=& \frac{\hat{\beta}_j-\beta_{j0}}{\tilde{\sigma}a_j} \\ &=& \frac{\frac{\hat{\beta}_j-\beta_{j0}}{\sigma a_j}}{\frac{\tilde{\sigma}a_j}{\sigma a_j}} \\ &=& \frac{\frac{\hat{\beta}_j-\beta_{j0}}{\sigma a_j}}{\sqrt{\frac{\tilde{\sigma}^2}{\sigma^2}}} \end{eqnarray*} $$ The numerator is distributed as $N(0,1)$ and is independent of the denominator. The square of the denominator is distributed as $\chi^2$ on $n-p-1$ df. By definition, this ratio is distributed as $t$ on $n-p-1$ df.

Related Question