Understanding an identity for least squares regression line gradient

least squaressummationweighted least squares

In section 2.2 of this paper, Gelman and Park present the following identity for the gradient of the least squares line through a set of 2D points:

…we recall a simple algebraic identity that expresses the least-squares regression of $y$ on $x$ as a weighted average of all pairwise comparisons:

$$\begin{align}
\hat\beta^{ls}&=\frac{\sum_i(y_i-\bar y)(x_i-\bar x)}{\sum_i(x_i-\bar x)^2}\\\\
&=\frac{\sum_{i,\,j}(y_i-y_j)(x_i-x_j)}{\sum_{i,\,j}(x_i-x_j)^2}\\\\
&=\frac{\sum_{i,\,j}\frac{y_i-y_j}{x_i-x_j}(x_i-x_j)^2}{\sum_{i,\,j}(x_i-x_j)^2}\end{align}$$

In the first line, which is a basic least squares result, the series are iterating over all the points. In the second and third lines the series are iterating over all pairs of points.

It feels like I might be missing something obvious, but how do we go from the first line to the second?

Best Answer

Rewrite the numerator

\begin{equation} \sum_i(y_i-\bar y)(x_i-\bar x) = \sum_i(y_i-\frac{1}{n}\sum_j y_j)(x_i-\frac{1}{n}\sum_j x_j) \end{equation} that is

\begin{equation} \sum_i(y_i-\bar y)(x_i-\bar x) = \sum_i(\sum_j \frac{1}{n} y_i-\frac{1}{n}\sum_j y_j)(\sum_j \frac{1}{n}x_i-\frac{1}{n}\sum_j x_j) \end{equation} or \begin{equation} \sum_i(y_i-\bar y)(x_i-\bar x) = \sum_i\sum_j( \frac{1}{n} y_i-\frac{1}{n} y_j)(\frac{1}{n}x_i-\frac{1}{n} x_j) = \frac{1}{n} \sum_{i,\,j}(y_i-y_j)(x_i-x_j) \end{equation}


Rewrite the denominator

\begin{equation} \sum_i(x_i-\bar x)^2 = \sum_i(x_i-\bar x)(x_i-\bar x) = \sum_i(x_i-\frac{1}{n}\sum_j x_j )(x_i-\frac{1}{n}\sum_j x_j ) \end{equation} that is \begin{equation} \sum_i(x_i-\bar x)^2 = \sum_i\sum_j(\frac{1}{n}x_i-\frac{1}{n}x_j )(\frac{1}{n}x_i-\frac{1}{n} x_j ) = \frac{1}{n} \sum_i\sum_j(x_i-x_j )^2 \end{equation}


Replace now

So \begin{equation} \frac{\sum_i(y_i-\bar y)(x_i-\bar x)}{\sum_i(x_i-\bar x)^2} = \frac{\frac{1}{n} \sum_{i,\,j}(y_i-y_j)(x_i-x_j)}{\frac{1}{n} \sum_i\sum_j(x_i-x_j )^2} = \frac{\sum_{i,\,j}(y_i-y_j)(x_i-x_j)}{\sum_{i,\,j}(x_i-x_j)^2} \end{equation}

Related Question