Solved – Least squares: Calculus to find residual minimizers

least squaresregression

Reading a section on simple regression in "An Introduction to Statistical Learning with Applications in R" I got a question on residual sum of squares minimization.
Quoting from the book:

… simple linear approach for predicting a quantitative response $Y$
on the basis of a single predictor variable $X$. It assumes that there
is approximately a linear relationship between $X$ and $Y$ .
Mathematically, we can write this linear relationship as $$Y \approx
b_0 + b_1X$$ You might read $\approx$ as "is approximately modeled
as".

Once we have used our coefficient parameter training data to produce
estimates $\hat b_0$ and $\hat b_1$ for the model coefficients, we can
predict future … by computing $$\hat y = \hat b_0 + \hat b_1 x, $$
where $\hat y$ indicates a prediction of $Y$ on the basis of $X = x$.
Here we use a hat symbol '^' to denote the estimated value for an
unknown parameter or coefficient, or to denote the predicted value of
the response.

In practice, $b_0$ and $b_1$ are unknown. So before we can … make
predictions, we must use data to estimate the coefficients. Let $(x_1,
y_1), (x_2, y_2), . . . , (x_n, y_n)$ represent n observation pairs,
each of which consists of a measurement of $X$ and a measurement of
$Y$.

Let $\hat y_i = \hat b_0 + \hat b_1x_i$ be the prediction for $Y$
based on the ith value of $X$. Then $e_i = y_i – \hat y_i $ represents
the ith residual … this is the difference between the $ith$ observed
response value and the $ith$ response value that is predicted by our
linear model. We define the residual sum of squares $(RSS)$ as $$RSS
= e^2_1 + e^2_2 + … + e^2_n$$, or equivalently as $$ RSS = (y_1 – \hat b_0 – \hat b_1 x_1)^2 + (y_2 – b_0 – b_1 x_2 )^2 + … + (y_n –
b_0 – b_1 x_n )^2$$

The least squares approach chooses $b_0$ and $b_1$ to minimize the
$RSS$. Using some calculus, one can show that the minimizers are:

$$ \hat b_1 = \frac {\sum_{i=1}^{n} (x_i – \bar{x}) (y_i – \bar{y}) }
{\sum_{i=1}^{n} (x_i – \bar{x})^2} $$

$$ \hat b_0 = \hat y – \hat b_1 \bar{x} $$

$$ \hat y = \frac {1}{n} \sum_{i=1}^{n} y_i $$

$$ \hat x = \frac {1}{n} \sum_{i=1}^{n} x_i $$

So my question is: What book has a detailed caluls showing how we get the above minimizers?

Please also advise a good textbook on least squares and regression in general.

Best Answer

The principle underlying least squares regression is that the sum of the squares of the errors is minimized. We can use calculus to find equations for the parameters $\beta_0$ and $\beta_1$ that minimize the sum of the squared errors, $S$.

$$S = \displaystyle\sum\limits_{i=1}^n \left(e_i \right)^2= \sum \left(y_i - \hat{y_i} \right)^2= \sum \left(y_i - \beta_0 - \beta_1x_i\right)^2$$

We want to find $\beta_0$ and $\beta_1$ that minimize the sum, $S$. We start by taking the partial derivative of $S$ with respect to $\beta_0$ and setting it to zero.

$$\frac{\partial{S}}{\partial{\beta_0}} = \sum 2\left(y_i - \beta_0 - \beta_1x_i\right)^1 (-1) = 0$$ $$\sum \left(y_i - \beta_0 - \beta_1x_i\right) = 0 $$ $$\sum \beta_0 = \sum y_i -\beta_1 \sum x_i $$ $$n\beta_0 = \sum y_i -\beta_1 \sum x_i $$ $$\beta_0 = \frac{1}{n}\sum y_i -\beta_1 \frac{1}{n}\sum x_i \tag{1}$$ $$\beta_0 = \bar y - \beta_1 \bar x \tag{*} $$

now take the partial of $S$ with respect to $\beta_1$ and set it to zero.
$$\frac{\partial{S}}{\partial{\beta_1}} = \sum 2\left(y_i - \beta_0 - \beta_1x_i\right)^1 (-x_i) = 0$$ $$\sum x_i \left(y_i - \beta_0 - \beta_1x_i\right) = 0$$ $$\sum x_iy_i - \beta_0 \sum x_i - \beta_1 \sum x_i^2 = 0 \tag{2}$$ substitute $(1)$ into $(2)$ $$\sum x_iy_i - \left( \frac{1}{n}\sum y_i -\beta_1 \frac{1}{n}\sum x_i\right) \sum x_i - \beta_1 \sum x_i^2 = 0 $$ $$\sum x_iy_i - \frac{1}{n} \sum x_i \sum y_i + \beta_1 \frac{1}{n} \left( \sum x_i \right) ^2 - \beta_1 \sum x_i^2 = 0 $$

$$\sum x_iy_i - \frac{1}{n} \sum x_i \sum y_i = - \beta_1 \frac{1}{n} \left( \sum x_i \right) ^2 + \beta_1 \sum x_i^2 $$ $$\sum x_iy_i - \frac{1}{n} \sum x_i \sum y_i = \beta_1 \left(\sum x_i^2 - \frac{1}{n} \left( \sum x_i \right) ^2 \right) $$ $$\beta_1 = \frac{\sum x_iy_i - \frac{1}{n} \sum x_i \sum y_i}{\sum x_i^2 - \frac{1}{n} \left( \sum x_i \right) ^2 } = \frac{cov(x,y)}{var(x)}\tag{*}$$