Statistics – Understanding Leverage in Simple Linear Regression

linear regressionstatistics

I am trying to understand how to calculate the leverage in a simple linear regression (just $1$ independent variable).

The leverage at value $X=x_i$ is known to be $h_{ii}=\frac{1}{n}+\frac{(x_i-\bar{x})^2}{\sum(x_i-\bar{x})^2}$

All the docs I browsed on Internet go through the general formulation for multiple linear regression, make use of matrix calculations and terminology, and end by showing the leverage as coming from the diagonal of the so-called hat-matrix…

I am sure the formula for the leverage can be derived without using matrices, in the case of a simple linear regression, but I have not been able to find how…

Does anybody know ? Any hint or link would be appreciated.

Best Answer

Back to the definitions:

Assuming $n$ observations $(x_i,y_i)_{ i=1\ldots n}$, the simple regression line is given by $$y=b_0+b_1x\qquad \text{ with } b_1=\frac{\sum_i(x_i-\bar x)(y_i-\bar y)}{\sum_i(x_i-\bar x)^2}\text{ and } b_0=\bar y-b_1\bar x$$

Now the leverage of the $k$-th observation is defined as the partial derivative of the predicted value $\hat {y_k}=b_0+b_1x_k$ with respect to $y_k$.
A change in $y_k$ is going to affect $b_0$ and $b_1$ and therefore $\hat {y_k}$.

Let $y_k^\prime=y_k+\delta$ with all other $x_i$'s and $y_i$'s fixed. Then $$\bar y^\prime=\bar y+\frac{\delta}{n}$$ That's the easy part. To compute the change in $b_1$, note that every term in the numerator changes, since $\bar y$ changes (that's what I forgot to take into account in the first version of this answer). However, we can use the following trick: $$b_1=\frac{\sum_i(x_i-\bar x)(y_i-\bar y)}{\sum_i(x_i-\bar x)^2}=\frac{\sum_i(x_i-\bar x)(y_i)}{\sum_i(x_i-\bar x)^2}$$ because $\sum_i(x_i-\bar x)=0$ by definition of $\bar x$, and therefore $\sum_i(x_i-\bar x)\bar y=0$ since $\bar y$ doesn't depend on $i$. It follows that $$\begin{array}{rl}b_1^\prime=&b_1-\frac{(x_k-\bar x)(y_k)}{\sum_i(x_i-\bar x)^2}+\frac{(x_k-\bar x)(y_k^\prime)}{\sum_i(x_i-\bar x)^2}\\=&b_1+\frac{(x_k-\bar x)}{\sum_i(x_i-\bar x)^2}\cdot\delta\\b_0^\prime=&\bar y^\prime -b_1^\prime \bar x\\ =& b_0+\frac{\delta}{n}-\frac{(x_k-\bar x)}{\sum_i(x_i-\bar x)^2}\cdot\delta\bar x\end{array}$$

So that finally $$\begin{array}{rl}\hat y_k^\prime=&b_0^\prime+b_1^\prime x_k \\ =&b_0+\frac{\delta}{n}-\frac{(x_k-\bar x)}{\sum_i(x_i-\bar x)^2}\cdot\delta\bar x+b_1x_k+\frac{(x_k-\bar x)}{\sum_i(x_i-\bar x)^2}\cdot\delta x_k\end{array}$$ And the effect on $\hat y_k$ is $$\hat y_k^\prime -\hat y_k=\delta\left[\frac 1n+\frac{(x_k-\bar x)^2}{\sum_i(x_i-\bar x)^2}\right]$$ The rate of change (the multiplier of $\delta$) is the desired leverage.

Another approach is to use matrices without speaking the word matrix:

Using the formulas for $b_0$ and $b_1$ one can prove that $$\hat y_i=\sum h_{i,j}y_j \qquad\text{ with }h_{i,j}=\frac 1n +\frac{(x_i-\bar x)(x_j-\bar x)}{\sum_k(x_k-\bar x)^2}$$ and from here the partial derivative is immediately equal to what we computed above.

Best Answer

Related Solutions

[Math] How to prove the variance of residuals in simple linear regression

Least Square Estimation of Linear Regression

Related Question