Solved – Why is $\sigma^2 = Var(\epsilon)$ when computing the standard error of a simple linear regression slope parameter

regressionstandard error

Assume the true underlying linear approximation of a set of data is equal to $Y=2+3X +\epsilon$ where $\epsilon$ represents the irreducible error that is inherent in a linear approximation. I then perform a linear regression and arrive at my $\hat{\beta}_0$ and $\hat{\beta}_1$ parameters. In order to determine the standard error of $\hat{\beta}_0$ and $\hat{\beta}_1$, I use the following formulas:

$$
SE(\hat{\beta}_0)^2= \sigma^2 \big{[} \frac{1}{n} + \frac{\bar{x}^2}{\sum_{i=1}^n(x_i-\bar{x})^2} \big{]}
$$

$$
SE(\hat{\beta}_1)^2= \frac{\sigma^2}{\sum_{i=1}^n(x_i-\bar{x})^2}
$$

where $\sigma^2 = Var(\epsilon)$

Why does $\sigma^2 = Var(\epsilon)$ and not $\sigma^2 = Var(x)$ (the variance of my sample population)?

Where does $\epsilon$ work its way in to the derivation of the standard errors of these two parameters? $\epsilon$ has nothing to do with the least squares approach for calculating $\hat{\beta}_0$ and $\hat{\beta}_1$ in the first place. Also, how do we determine $\epsilon$? Doesn't $\epsilon$ require knowing that the "true" linear approximation of the data is $Y=2+3X +\epsilon$? We obviously would not know this in real life.

EDIT: one last question: what does it mean to assume that the errors $\epsilon_i$ for each observation are uncorrelated with common variance $\sigma^2$?

Best Answer

I think your question comes from the fact that you confuse true parameters with their estimates.

All the quantities you mention in your question are "true" (unobservable!) population parameters:

  • $\epsilon$
  • $\sigma^2 = Var(\epsilon)$
  • $SE(\hat{\beta}_0)^2= \sigma^2 \big{[} \frac{1}{n} + \frac{\bar{x}^2}{\sum_{i=1}^n(x_i-\bar{x})^2} \big{]}$
  • $SE(\hat{\beta}_1)^2= \frac{\sigma^2}{\sum_{i=1}^n(x_i-\bar{x})^2}$

Pay attention that even standard errors you ask about are unobservable parameters! These are the "true" standard errors of $\beta$ parameters you would achieve if you kept drawing samples from infinite population and estimating new and new regression models on those samples. That what you DO calculate with your single sample are their estimates, which you calculate by substituting the variance of true errors with the variance of the residuals $\sigma^2 = Var(e)$ (see this question and its great answers for the explanation of the difference: What is the difference between errors and residuals?):

  • $\hat{SE(\hat{\beta}_0)^2}= \sigma^2 \big{[} \frac{1}{n} + \frac{\bar{x}^2}{\sum_{i=1}^n(x_i-\bar{x})^2} \big{]}$
  • $\hat{SE(\hat{\beta}_1)^2}= \frac{\sigma^2}{\sum_{i=1}^n(x_i-\bar{x})^2}$

In these estimates of standard errors actually enter both $Var(e)$ and $Var(x)$. Using $\hat{SE(\hat{\beta}_1)^2}$ as an example you can rewrite its formula by dividing the nominator and the denominator by $n$:

$$ \hat{SE(\hat{\beta}_1)^2}= \frac{\frac{\sigma^2}{n}}{\frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})^2} $$

and you can notice that the term in the denominator is acually $Var(x)$.

Answering your last question:

  • uncorrelation of errors means that the value of one error tells you nothing about the vaules of other errors, e.g. if one of the points is above the regression line (positive error) it doesn't mean that the neigbouring point is more likely to be also positive. If this assumtion does not hold points may exhibit some patterns around the line, e.g. form "waves" around the line.
  • common variance means the the spread of the points around the regression line is constant. For some situations such an assumption is not reasonable e.g. if you regress income against years of education the income is likely to increase with the length of education but also there will be larger differences in income among high earning individuals than among low-income ones.
Related Question