[Math] Understanding better linear regression

linear regressionstatistics

I have been trying to understand linear regression as much as possible, so I am asking you this question in order to keep doing it.

There's simple linear regression, where we have just one independent variable $x$, and we have multiple linear regression, where there are multiple predictors.

In a linear regression related problem we usually have data for the predictors and the response variable, whereas what we need to find is a line that better estimates the linearly relationship between $y$ and the predictors.

Of course a line will not describe perfectly the relationship between many observations $x_i$ and $y_i$ unless the relationship is really mathematically linear, which is usually not the case in the real world (I guess). That's why we want to estimate the "best line", there might be errors, which usually we denote with $\epsilon$ in the following perfect model $$y = \beta_0 + \beta_1 x_1 + \cdots+ \beta_n x_n + \epsilon$$
The line that we want to estimate has therefore no "error terms" $\epsilon$, because they cannot estimated, thus it should look like $$\hat{y} = \beta_0 + \beta_1 \hat{x}_1 + \cdots+ \beta_n \hat{x}_n$$ What we need to find from this line are the coefficients, and we can find them using least squares for example.

This is what I have understood so far. I am not even sure if this is all correct. My question is then: could you please help me confirming, correcting and adding additional information that you think it might be useful to further understand this statistical model? I know this might be a broad question, but if you feel like I need to understand more details, please help me!

Best Answer

Many will consider this an unreasonably broad question, and I certainly won't dispute that. I'll just address some points suggested by your comments.

  • A common confusion is to think that it's called "linear regression" because it is fitting a line. Then people get confused when they hear that fitting a parabola is linear regression and "nonlinear regression" is something other than that. See, however, this answer.
  • Your "hats" are in the wrong places. You need $$\hat y = \hat\beta_0 + \hat\beta_1 x_1 + \cdots + \hat\beta_n x_n.$$ I have qualms about the use of the letter $n$ for this purpose, since that often means the sample size.
  • You say the errors cannot be estimated, but in fact they can. The residuals $\hat \varepsilon_i$ are the observable estimates of the unobservable errors $\varepsilon_i$. See this article: https://en.wikipedia.org/wiki/Errors_and_residuals
Related Question