Solved – Why is $\sigma^2 = Var(\epsilon)$ when computing the standard error of a simple linear regression slope parameter

regressionstandard error

Assume the true underlying linear approximation of a set of data is equal to $Y=2+3X +\epsilon$ where $\epsilon$ represents the irreducible error that is inherent in a linear approximation. I then perform a linear regression and arrive at my $\hat{\beta}_0$ and $\hat{\beta}_1$ parameters. In order to determine the standard error of $\hat{\beta}_0$ and $\hat{\beta}_1$, I use the following formulas:

$$
SE(\hat{\beta}_0)^2= \sigma^2 \big{[} \frac{1}{n} + \frac{\bar{x}^2}{\sum_{i=1}^n(x_i-\bar{x})^2} \big{]}
$$

$$
SE(\hat{\beta}_1)^2= \frac{\sigma^2}{\sum_{i=1}^n(x_i-\bar{x})^2}
$$

where $\sigma^2 = Var(\epsilon)$

Why does $\sigma^2 = Var(\epsilon)$ and not $\sigma^2 = Var(x)$ (the variance of my sample population)?

Where does $\epsilon$ work its way in to the derivation of the standard errors of these two parameters? $\epsilon$ has nothing to do with the least squares approach for calculating $\hat{\beta}_0$ and $\hat{\beta}_1$ in the first place. Also, how do we determine $\epsilon$? Doesn't $\epsilon$ require knowing that the "true" linear approximation of the data is $Y=2+3X +\epsilon$? We obviously would not know this in real life.

EDIT: one last question: what does it mean to assume that the errors $\epsilon_i$ for each observation are uncorrelated with common variance $\sigma^2$?

Best Answer

I think your question comes from the fact that you confuse true parameters with their estimates.

All the quantities you mention in your question are "true" (unobservable!) population parameters:

$\epsilon$
$\sigma^2 = Var(\epsilon)$
$SE(\hat{\beta}_0)^2= \sigma^2 \big{[} \frac{1}{n} + \frac{\bar{x}^2}{\sum_{i=1}^n(x_i-\bar{x})^2} \big{]}$
$SE(\hat{\beta}_1)^2= \frac{\sigma^2}{\sum_{i=1}^n(x_i-\bar{x})^2}$

Pay attention that even standard errors you ask about are unobservable parameters! These are the "true" standard errors of $\beta$ parameters you would achieve if you kept drawing samples from infinite population and estimating new and new regression models on those samples. That what you DO calculate with your single sample are their estimates, which you calculate by substituting the variance of true errors with the variance of the residuals $\sigma^2 = Var(e)$ (see this question and its great answers for the explanation of the difference: What is the difference between errors and residuals?):

$\hat{SE(\hat{\beta}_0)^2}= \sigma^2 \big{[} \frac{1}{n} + \frac{\bar{x}^2}{\sum_{i=1}^n(x_i-\bar{x})^2} \big{]}$
$\hat{SE(\hat{\beta}_1)^2}= \frac{\sigma^2}{\sum_{i=1}^n(x_i-\bar{x})^2}$

In these estimates of standard errors actually enter both $Var(e)$ and $Var(x)$. Using $\hat{SE(\hat{\beta}_1)^2}$ as an example you can rewrite its formula by dividing the nominator and the denominator by $n$:

$$ \hat{SE(\hat{\beta}_1)^2}= \frac{\frac{\sigma^2}{n}}{\frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})^2} $$

and you can notice that the term in the denominator is acually $Var(x)$.

Answering your last question:

uncorrelation of errors means that the value of one error tells you nothing about the vaules of other errors, e.g. if one of the points is above the regression line (positive error) it doesn't mean that the neigbouring point is more likely to be also positive. If this assumtion does not hold points may exhibit some patterns around the line, e.g. form "waves" around the line.
common variance means the the spread of the points around the regression line is constant. For some situations such an assumption is not reasonable e.g. if you regress income against years of education the income is likely to increase with the length of education but also there will be larger differences in income among high earning individuals than among low-income ones.

Related Solutions

Regression – Expected Value and Variance of Slope Parameter Estimation in Simple Linear Regression

$E\left(\frac{\sum (x_i - \bar{x})\beta_1 x_i}{S_{xx}}\right)$ = $\frac{\sum (x_i - \bar{x})\beta_1 x_i}{S_{xx}}$ because everything is constant. The rest is just algebra. Evidently you need to show $\sum (x_i - \bar{x}) x_i = S_{xx}$. Looking at the definition of $S_{xx}$ and comparing the two sides leads one to suspect $\sum(x_i - \bar{x}) \bar{x} = 0$. This follows easily from the definition of $\bar{x}$.
$Var\left(\frac{\sum (x_i - \bar{x})\epsilon}{S_{xx}}\right)$ = $\sum \left[\frac{(x_i - \bar{x})^2}{S_{xx}^2}\sigma^2\right] $. It simplifies, using the definition of $S_{xx}$, to the desired result.

Regression – Why Standard Error of Intercept Increases When x? is Far from Zero

Because the regression line fit by ordinary least squares will necessarily go through the mean of your data (i.e., $(\bar x, \bar y)$)—at least as long as you don't suppress the intercept—uncertainty about the true value of the slope has no effect on the vertical position of the line at the mean of $x$ (i.e., at $\hat y_{\bar x}$). This translates into less vertical uncertainty at $\bar x$ than you have the further away from $\bar x$ you are. If the intercept, where $x=0$ is $\bar x$, then this will minimize your uncertainty about the true value of $\beta_0$. In mathematical terms, this translates into the smallest possible value of the standard error for $\hat\beta_0$.

Here is a quick example in R:

set.seed(1)                           # this makes the example exactly reproducible
x0      = rnorm(20, mean=0, sd=1)     # the mean of x varies from 0 to 10
x5      = rnorm(20, mean=5, sd=1)
x10     = rnorm(20, mean=10, sd=1)
y0      = 5 + 1*x0  + rnorm(20)       # all data come from the same  
y5      = 5 + 1*x5  + rnorm(20)       #  data generating process
y10     = 5 + 1*x10 + rnorm(20)
model0  = lm(y0~x0)                   # all models are fit the same way
model5  = lm(y5~x5)
model10 = lm(y10~x10)

enter image description here

This figure is a bit busy, but you can see the data from several different studies where the distribution of $x$ was closer or further from $0$. The slopes differ a little from study to study, but are largely similar. (Notice they all go through the circled X that I used to mark $(\bar x, \bar y)$.) Nonetheless, the uncertainty about the true value of those slopes causes the uncertainty about $\hat y$ to expand the further you get from $\bar x$, meaning that the $SE(\hat\beta_0)$ is very wide for the data that were sampled in the neighborhood of $x=10$, and very narrow for the study in which the data were sampled near $x=0$.

Edit in response to comment: Unfortunately, centering your data after you have them will not help you if you want to know the likely $y$ value at some $x$ value $x_\text{new}$. Instead, you need to center your data collection on the point you care about in the first place. To understand these issues more fully, it may help you to read my answer here: Linear regression prediction interval.

Best Answer

Related Solutions

Regression – Expected Value and Variance of Slope Parameter Estimation in Simple Linear Regression

Regression – Why Standard Error of Intercept Increases When x? is Far from Zero

Related Question