Solved – Leverage formula/derivation and Hat matrix

leveragematrix

1)
So I know that $h_{ii}$ is just the ith row ith column of $H=X(X^TX)^{-1}X^T$. Intuitively, why is this the case? I understand that H is the projection matrix and leverage is measuring how far away an observation is from other observations. I also understand it when you look at the formula in the case of simple linear regression: $h_{ii}=\frac{1}{n}+\frac{(x_i-\bar{x})^2}{\sum_{j=1}^n(x_j-\bar{x})^2}$. But I don't understand how $X_i^T(X^TX)^{-1}X_i$ measures leverage.

2)
Also, when trying to derive $h_{ii}$ for SLR, I'm getting $\frac{\sum x_j^2-2x_i\sum x_j+x_i^2n}{n\sum x_j^2-(\sum x_j)^2}$ which I can't simplify into the previous formula, so I assume I did it wrong. I used $H=X(X^TX)^{-1}X^T$ where $X=\left[\begin{array}
{cc}
1 & x_1 \\
1 & x_2 \\
… & … \\
1 & x_n
\end{array}\right]$
and ended up with
$H=\frac{1}{n\sum x_i^2-(\sum x_i)^2} \left[\begin{array}
{cccc}
\sum x_i^2-2x_1\sum x_i+x_1^2n & \sum x_i^2-x_2\sum x_i-x_1\sum x_i+x_2x_1n & … & \sum x_i^2-x_n\sum x_i-x_1\sum x_i+x_nx_in \\
\sum x_i^2-x_1\sum x_i-x_2\sum x_i+x_1x_2n & \sum x_i^2-2x_2\sum x_i+x_2^2n & … & … \\
… & … & … & … \\
\sum x_i^2-x_1\sum x_i-x_n\sum x_i+x_1x_nn & … & … & \sum x_i^2-2x_n\sum x_i+x_n^2n
\end{array}\right]$
I had $(X^TX)^{-1}=\frac{1}{n\sum x_i^2-(\sum x_i)^2}
\left[\begin{array}
{cc}
\sum x_i^2 & -\sum x_i \\
-\sum x_i & n
\end{array}\right]$
Anywhere obvious where I went wrong?

Best Answer

I think you would be much better off by using matrix algebra throughout.

As for your first question, remember that the fits in a linear model are obtained as:

$$\hat{y} = H y$$

Intuitively, $h_{ii}=1$ means that observation $y_i$ fully determines $\hat{y}_{i}$, so in a sense it has maximum leverage. If $h_{ii} \approx 0$, that would imply that observation $y_i$ has very little role in determining $\hat{y}_{i}$ which would be mostly determined by the rest of the observations.

Related Question