Solved – Leverage formula/derivation and Hat matrix

leveragematrix

1)
So I know that $h_{ii}$ is just the ith row ith column of $H=X(X^TX)^{-1}X^T$. Intuitively, why is this the case? I understand that H is the projection matrix and leverage is measuring how far away an observation is from other observations. I also understand it when you look at the formula in the case of simple linear regression: $h_{ii}=\frac{1}{n}+\frac{(x_i-\bar{x})^2}{\sum_{j=1}^n(x_j-\bar{x})^2}$. But I don't understand how $X_i^T(X^TX)^{-1}X_i$ measures leverage.

2)
Also, when trying to derive $h_{ii}$ for SLR, I'm getting $\frac{\sum x_j^2-2x_i\sum x_j+x_i^2n}{n\sum x_j^2-(\sum x_j)^2}$ which I can't simplify into the previous formula, so I assume I did it wrong. I used $H=X(X^TX)^{-1}X^T$ where $X=\left[\begin{array}
{cc}
1 & x_1 \\
1 & x_2 \\
… & … \\
1 & x_n
\end{array}\right]$
and ended up with
$H=\frac{1}{n\sum x_i^2-(\sum x_i)^2} \left[\begin{array}
{cccc}
\sum x_i^2-2x_1\sum x_i+x_1^2n & \sum x_i^2-x_2\sum x_i-x_1\sum x_i+x_2x_1n & … & \sum x_i^2-x_n\sum x_i-x_1\sum x_i+x_nx_in \\
\sum x_i^2-x_1\sum x_i-x_2\sum x_i+x_1x_2n & \sum x_i^2-2x_2\sum x_i+x_2^2n & … & … \\
… & … & … & … \\
\sum x_i^2-x_1\sum x_i-x_n\sum x_i+x_1x_nn & … & … & \sum x_i^2-2x_n\sum x_i+x_n^2n
\end{array}\right]$
I had $(X^TX)^{-1}=\frac{1}{n\sum x_i^2-(\sum x_i)^2}
\left[\begin{array}
{cc}
\sum x_i^2 & -\sum x_i \\
-\sum x_i & n
\end{array}\right]$
Anywhere obvious where I went wrong?

Best Answer

I think you would be much better off by using matrix algebra throughout.

As for your first question, remember that the fits in a linear model are obtained as:

$$\hat{y} = H y$$

Intuitively, $h_{ii}=1$ means that observation $y_i$ fully determines $\hat{y}_{i}$, so in a sense it has maximum leverage. If $h_{ii} \approx 0$, that would imply that observation $y_i$ has very little role in determining $\hat{y}_{i}$ which would be mostly determined by the rest of the observations.

Related Solutions

Solved – Multivariate normal posterior

With the distributions on our random vectors:

$\mathbf x_i | \mathbf \mu \sim N(\mu , \mathbf \Sigma)$

$\mathbf \mu \sim N(\mathbf \mu_0, \mathbf \Sigma_0)$

By Bayes's rule the posterior distribution looks like:

$p(\mu| \{\mathbf x_i\}) \propto p(\mu) \prod_{i=1}^N p(\mathbf x_i | \mu)$

So:

$\ln p(\mu| \{\mathbf x_i\}) = -\frac{1}{2}\sum_{i=1}^N(\mathbf x_i - \mu)'\mathbf \Sigma^{-1}(\mathbf x_i - \mu) -\frac{1}{2}(\mu - \mu_0)'\mathbf \Sigma_0^{-1}(\mu - \mu_0) + const$

$ = -\frac{1}{2} N \mu' \mathbf \Sigma^{-1} \mu + \sum_{i=1}^N \mu' \mathbf \Sigma^{-1} \mathbf x_i -\frac{1}{2} \mu' \mathbf \Sigma_0^{-1} \mu + \mu' \mathbf \Sigma_0^{-1} \mu_0 + const$

$ = -\frac{1}{2} \mu' (N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1}) \mu + \mu' (\mathbf \Sigma_0^{-1} \mu_0 + \mathbf \Sigma^{-1} \sum_{i=1}^N \mathbf x_i) + const$

$= -\frac{1}{2}(\mu - (N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1})^{-1}(\mathbf \Sigma_0^{-1} \mu_0 + \mathbf \Sigma^{-1} \sum_{i=1}^N \mathbf x_i))' (N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1}) (\mu - (N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1})^{-1}(\mathbf \Sigma_0^{-1} \mu_0 + \mathbf \Sigma^{-1} \sum_{i=1}^N \mathbf x_i)) + const$

Which is the log density of a Gaussian:

$\mu| \{\mathbf x_i\} \sim N((N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1})^{-1}(\mathbf \Sigma_0^{-1} \mu_0 + \mathbf \Sigma^{-1} \sum_{i=1}^N \mathbf x_i), (N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1})^{-1})$

Using the Woodbury identity on our expression for the covariance matrix:

$(N \mathbf \Sigma^{-1} + \mathbf \Sigma_0^{-1})^{-1} = \mathbf \Sigma(\frac{1}{N} \mathbf \Sigma + \mathbf \Sigma_0)^{-1} \frac{1}{N} \mathbf \Sigma_0$

Which provides the covariance matrix in the form the OP wanted. Using this expression (and its symmetry) further in the expression for the mean we have:

$\mathbf \Sigma(\frac{1}{N} \mathbf \Sigma + \mathbf \Sigma_0)^{-1} \frac{1}{N} \mathbf \Sigma_0 \mathbf \Sigma_0^{-1} \mu_0 + \frac{1}{N} \mathbf \Sigma_0(\frac{1}{N} \mathbf \Sigma + \mathbf \Sigma_0)^{-1} \mathbf \Sigma \mathbf \Sigma^{-1} \sum_{i=1}^N \mathbf x_i$

$= \mathbf \Sigma(\frac{1}{N} \mathbf \Sigma + \mathbf \Sigma_0)^{-1} \frac{1}{N} \mu_0 + \mathbf \Sigma_0(\frac{1}{N} \mathbf \Sigma + \mathbf \Sigma_0)^{-1} \sum_{i=1}^N (\frac{1}{N} \mathbf x_i)$

Which is the form required by the OP for the mean.

Solved – Confusion about scatter matrices

What @whuber and I were saying to you in the comments are equivalent things. @whuber pointed out that the text you were reading makes points column vectors. I stuck to your own original notation where points are row vectors (this way of presenting is more common). When points are columns, thansposed ("T", or just ' in my notation) multiplier is the right one; when they are rows, it is the left one. Instead of multiplying separate vectors, it's more convenient to multiply whole matrices. See it with your data (matrix A = your "Ck"):

****** Points are rows, variables are columns [more common] ******
A
  1  2  3
  4  5  6

Column-centered A
  -1.500000000  -1.500000000  -1.500000000
   1.500000000   1.500000000   1.500000000

A'A, the scatter matrix
   4.500000000   4.500000000   4.500000000
   4.500000000   4.500000000   4.500000000
   4.500000000   4.500000000   4.500000000

****** Points are columns, variables are rows [that's how in your book] ******    
A
  1  4
  2  5
  3  6

Row-centered A
  -1.500000000   1.500000000
  -1.500000000   1.500000000
  -1.500000000   1.500000000

AA', the scatter matrix
   4.500000000   4.500000000   4.500000000
   4.500000000   4.500000000   4.500000000
   4.500000000   4.500000000   4.500000000

Best Answer

Related Solutions

Solved – Multivariate normal posterior

Solved – Confusion about scatter matrices

Related Question