I am learning regession, and I dont understand why do we need to square in residual sum of squares. whats wrong with just use residual sum to represent as the error value? What is the benefit of squaring the residual?

# Solved – In residual sum of squares, why do we need to square?

regression

#### Related Solutions

If you take a normal regression model, $$Y_i|X_i\sim\mathcal{N}(X_i^\text{T}\beta,\sigma^2),$$ the density of the data $(Y_1,\ldots,Y_n)$ writes as follows: \begin{align*}&\exp\left\{ -\frac{1}{2\sigma^2}\sum_{i=1}^n (Y_i-X_i^\text{T}\beta)^2 \right\}\\ &\qquad=\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^n [(Y_i-X_i^\text{T}\hat{\beta})^2 +(X_i^\text{T}\hat{\beta}-X_i^\text{T}\beta)^2]\right\}\\ &\qquad=\exp\left\{-\frac{1}{2\sigma^2}[\text{SSR}+\text{SSE}]\right\}\\ &\qquad=\exp\left\{-\frac{1}{2\sigma^2}\text{SSE}\right\}\times\exp\left\{-\frac{1}{2\sigma^2}\text{SSR}\right\}\end{align*} and only the first term depends on the parameter $\beta$ and hence characterises the model fit, while the second term is about the residual variability of the $Y_i$'s around their best prediction or projection, $X_i^\text{T}\hat{\beta}$.

The principle underlying least squares regression is that the sum of the squares of the errors is minimized. We can use calculus to find equations for the parameters $\beta_0$ and $\beta_1$ that minimize the sum of the squared errors, $S$.

$$S = \displaystyle\sum\limits_{i=1}^n \left(e_i \right)^2= \sum \left(y_i - \hat{y_i} \right)^2= \sum \left(y_i - \beta_0 - \beta_1x_i\right)^2$$

We want to find $\beta_0$ and $\beta_1$ that minimize the sum, $S$. We start by taking the partial derivative of $S$ with respect to $\beta_0$ and setting it to zero.

$$\frac{\partial{S}}{\partial{\beta_0}} = \sum 2\left(y_i - \beta_0 - \beta_1x_i\right)^1 (-1) = 0$$ $$\sum \left(y_i - \beta_0 - \beta_1x_i\right) = 0 $$ $$\sum \beta_0 = \sum y_i -\beta_1 \sum x_i $$ $$n\beta_0 = \sum y_i -\beta_1 \sum x_i $$ $$\beta_0 = \frac{1}{n}\sum y_i -\beta_1 \frac{1}{n}\sum x_i \tag{1}$$ $$\beta_0 = \bar y - \beta_1 \bar x \tag{*} $$

now take the partial of $S$ with respect to $\beta_1$ and set it to zero.

$$\frac{\partial{S}}{\partial{\beta_1}} = \sum 2\left(y_i - \beta_0 - \beta_1x_i\right)^1 (-x_i) = 0$$
$$\sum x_i \left(y_i - \beta_0 - \beta_1x_i\right) = 0$$
$$\sum x_iy_i - \beta_0 \sum x_i - \beta_1 \sum x_i^2 = 0 \tag{2}$$
substitute $(1)$ into $(2)$
$$\sum x_iy_i - \left( \frac{1}{n}\sum y_i -\beta_1 \frac{1}{n}\sum x_i\right) \sum x_i - \beta_1 \sum x_i^2 = 0 $$
$$\sum x_iy_i - \frac{1}{n} \sum x_i \sum y_i + \beta_1 \frac{1}{n} \left( \sum x_i \right) ^2 - \beta_1 \sum x_i^2 = 0 $$

$$\sum x_iy_i - \frac{1}{n} \sum x_i \sum y_i = - \beta_1 \frac{1}{n} \left( \sum x_i \right) ^2 + \beta_1 \sum x_i^2 $$ $$\sum x_iy_i - \frac{1}{n} \sum x_i \sum y_i = \beta_1 \left(\sum x_i^2 - \frac{1}{n} \left( \sum x_i \right) ^2 \right) $$ $$\beta_1 = \frac{\sum x_iy_i - \frac{1}{n} \sum x_i \sum y_i}{\sum x_i^2 - \frac{1}{n} \left( \sum x_i \right) ^2 } = \frac{cov(x,y)}{var(x)}\tag{*}$$

## Best Answer

Squaring the residuals changes the shape of the regularization function. In particular, large errors are penalized more with the square of the error. Imagine two cases, one where you have one point with an error of 0 and another with an error of 10, versus another case where you have two points with an error of 5. The linear error function will treat both of these as having equal sum of residuals, while the squared error will penalize the case with the large error more.

With a squared residual, your solution will prefer more small errors to having any large errors. The linear residual is indifferent, not caring whether the total error is all coming from one sample or spread out as a sum of many tiny errors.

You could also raise the error to a higher power to penalize large errors even more. Summing the tenth power of the residuals, for example, would likely result in a solution that has small errors for most points, but no large errors for any one point.