Regression – Do We Actually Take a Random Line in the First Step of Linear Regression?

linear modelmachine learningregression

random_line

This is the screenshot I took from a video on linear regression made by Luis Serrano. He explained linear regression step by step (scratch version). The first step was to start with a random line.

The question is do we actually draw a random line, or instead do we perform some calculation like taking an average of y values and initially draw a line. Because if we take any random line it might not fall near any points at all. Maybe it will fall on the 3rd quadrant of the coordinate system where there are no points in this case.

Best Answer

NO

What we want to find are the parameters that result in the least amount of error, and OLS defines error as the squared differences between observed values $y_i$ and predicted values $\hat y_i$. Error often gets denoted by an $L$ for "loss".

$$ L(y, \hat y) = \sum_{i = 1}^N \bigg(y_i - \hat y_i\bigg)^2 $$

We have our regression model, $\hat y_i =\hat\beta_0 + \hat\beta_1x$, so the $\hat y$ is a function of $\hat\beta_0$ and $\hat\beta_1$.

$$ L(y, \hat\beta_0, \hat\beta_1) = \sum_{i = 1}^N \bigg(y_i - (\hat\beta_0 + \hat\beta_1x)\bigg)^2 $$

We want to find the $\hat\beta_0$ and $\hat\beta_1$ that minimize $L$.

What the video does is simulate pieces of the entire "loss function". For $\hat\beta_0 = 1$ and $\hat\beta_1 = 7$, you get a certain loss value. For $\hat\beta_0 = 1$ and $\hat\beta_1 = 8$, you get another loss value. One approach to finding the minimum is to pick random values until you find one that results in a loss value that seems low enough (or you're tired of waiting). Much of the deep learning work uses variations of this, with tricks like stochastic gradient descent to make the algorithm get (close to) the right answer in a short amount of time.

In OLS linear regression, however, calculus gives us a solution to the minimization problem, and we do not have to play such games.

$$\hat\beta_1=\frac{cov(x,y)}{var(x)}\\ \hat\beta_0=\bar y-\hat\beta_1\bar x$$