Solved – the meaning of the residual standard error in linear ordinary least squares output

least squaresregressionresidualssums-of-squares

In the linear model where $Y=X\beta+e$ and $e\sim N(0,\sigma^2 I_n)$, I understand that the expected value of $RSS$ is $\sigma^2 (n-p)$, so we can estimate $\sigma$ with $\sqrt(RSS/(n-p))$, and this is what R calls "residual standard error" in the summary of a linear model.

My first question is what the phrase residual standard error actually refers to. I think this was asked before but the answer did not seem specific enough. I think residual standard error should refer to an estimate of the standard deviation of the estimates $\hat{e}_i$, but it doesn't seem obvious to me that this (sd($\hat{e_i}$)) would be $\sigma$. Or does it mean an estimate of the standard deviation of $e_i$ and is just misnamed.

Also, even though I understand (on paper) why $RSS/(n-p)$ is an unbiased estimate of $\sigma^2$, why exactly does the formula for the sample standard deviation $\sum (\hat{e_i}-0)^2/(n-1)$ not apply? It seems like $\hat{e_i}$ is a "sample from a population with variance $\sigma^2$", but there are some conditions that prevent the usual formula from applying.

Sorry if these have been asked before, but none of the posts I checked exactly answered my questions.

Best Answer

The error is adjusted for the number of parameters in the model. This is just like standard deviation. For standard deviation one is adjusting for the fact that the mean value is an estimate by using $n-1$ in the denominator as opposed to $n$. If one has other parameters as well then one has to subtract $p$ from $n$ to adjust for that. Suppose that $p=n$, then there is no "play" in the solution, and no regression. What you would have in that case is an exact solution, not a regression per se. Then if you have $n=p+1$, there is only one degree of freedom, so all of the variability is only applied to one surfeit measurement, which has so little variability that it accounts the entire root RSS. As $n$ becomes progressively larger compared to $p$, then in general (for non-linear and linear modelling) we are adjusting by $n-p$.

BTW, residual sum squared is defined as $RSS = \sum_{i=1}^n (y_i - f(x_i))^2$ for any $f(x_i)$, whether linear or not.