Solved – the difference between $\beta_1$ and $\hat{\beta}_1$

regression

Suppose I have a random sample $\lbrace x_n ,y_n \rbrace_{n=1}^N$.

Suppose $$y_n = \beta_0 + \beta_1 x_n + \varepsilon_n$$

and $$\hat{y}_n = \hat{\beta}_0 +\hat{\beta}_1 x_n$$

What is the difference between $\beta_1$ and $\hat{\beta}_1$?

Best Answer

$\beta_1$ is an idea - it doesn't really exist in practice. But if the Gauss-Markov assumption hold, $\beta_1$ would give you that optimal slope with values above and below it on a vertical "slice" vertical to the dependent variable forming a nice normal Gaussian distribution of residuals. $\hat \beta_1$ is the estimate of $\beta_1$ based on the sample.

The idea is that you are working with a sample from a population. Your sample forms a data cloud, if you will. One of the dimensions corresponds to the dependent variable, and you try to fit the line that minimizes the error terms - in OLS, this is the projection of the dependent variable on the vector subspace formed by the column space of the model matrix. These estimates of the population parameters are denoted with the $\hat \beta$ symbol. The more data points you have the more accurate the estimated coefficients, $\hat \beta_i$ are, and the better the estimation of these idealized population coefficients, $\beta_i$.

Here is the difference in slopes ($\beta$ versus $\hat \beta$) between the "population" in blue, and the sample in isolated black dots:

enter image description here

The regression line is dotted and in black, whereas the synthetically perfect "population" line is in solid blue. The abundance of points provides a tactile sense of the normality of the residuals distribution.

Related Question