Finding equation of best fit line in simple linear regression

linear regressionregression

To find the best fit line for a set of data $(x_i,y_i)$ by minimising the sum of least squares, one requires to minimize $\frac{\partial S}{\partial \theta{_0}}$ and $\frac{\partial S}{\partial \theta{_1}}$, where
\begin{equation}
S = \frac{1}{n}\sum_{i=1}^n (y_i – (\theta_0 + \theta_1x_i))^2
\end{equation}

Minimizing $\frac{\partial S}{\partial \theta{_0}}$ gives (where $\bar{x}$ and $\bar{y}$ represent the means of x and y values respectively.)
\begin{equation}
\bar{y} – \theta_1\bar{x} = \theta_0
\end{equation}

And minimizing $\frac{\partial S}{\partial \theta{_1}}$ and simplifying gives
\begin{equation}
\sum_{i=1}^n y_ix_i – \sum_{i=1}^n\theta_0x_i – \sum_{i=1}^n\theta_1x_i^2 = 0
\end{equation}

I'm stuck here. How to proceed further to obtain $\theta_1$ in the following form?
\begin{equation}
\theta_1 = \frac{\sum_{i=1}^n (x_i – \bar{x})(y_i – \bar{y})}{\sum_{i=1}^n (x_i – \bar{x})^2}
\end{equation}

Best Answer

Plug in $\hat{\theta}_0$ into the second derivative. Just don't forget to change signs as the derivative of $-x\theta_1$ w.r.t. $\theta_1$ is $-x$. Then you have $$ \sum x_i y_i + \sum x_i ( \bar{y} - \theta_1 \bar{x} ) + \theta_1\sum x_i ^ 2 = 0 $$ $$ \sum x_i y _i + \bar{y}\sum x_i - \theta_1\bar{x}\sum x_i + \theta_1 \sum x_i ^ 2 = 0 $$ note that $\sum x_i = n \bar{x}$ and express $\theta_1$ with all the other terms $$ \hat{\theta}_1 = \frac{ \sum x_i y _i - n\bar{y}\bar{x} }{ \sum x_i ^2 - n \bar{x}^2 } = \frac{ \sum (x_i - \bar{x}) (y _i - \bar{y}) }{ \sum ( x_i - \bar{x})^2 } $$

Related Question