Linear regression (linear curve fit) – method of ordinary least square derivation

curves

I am trying to step through the derivation of linear regression curve fitting with ordinary least squares method and everything looks great except I am puzzled how multiple sources make the jump from step 1 to step 2 shown below?

Step 1:

$$m=\frac{\sum_i(\overline{y}-y_i)}{\sum_i(\overline{x}-x_i)}$$

Step 2

$$m=\frac{\sum_i((\overline{y}-y_i)\cdot(\overline{x}-x_i))}{\sum_i(\overline{x}- x_i)^2}$$

Source1 | Source2

When I calculate the slope with step $1$ which theoretically should be the same as step $2$ I get a divide by $0$ error because a $\sum_i(\overline{x}-x_i)$ will always yield $0$. So how is it that these two equations are equal but one yields a divide by $0$ error and one returns the correct linear slope from a cluster of points?

UPDATE: derivation show in sources are incorrect and miss-leading! Step 2 is in fact the correct answer, however both sources show that step 2 came from step 1 that is incorrect. The error step is show below (along with its corrected derivation)

$$m=\sum_i(y_i*x_i-\overline{y}*x_i+m*\overline{x}*x_i-m*x_i^2)=0$$

incorrect step in source here was to factor out $x_i$ and divide both side by $x_i$ to remove it out of the equation. This is incorrect since $x_i$ is not a constant and cannot be removed from the summation.

correct step here would have been to break out the sums and solve for $m$:

$$m=\frac{\sum_i(\overline{y}*x_i-y_i*x_i)}{\sum_i(\overline{x}*x_i-x_i^2)}$$

Reference.

Best Answer

After an admittedly quick look, I think the derivations you cite are convoluted at best, and faulty at worst. I believe the step they make to get to what you are calling "Step 1" is incorrect. That expression does not follow from the prior step.

The OLS derivation on wikipedia is sound.

Related Question