Solved – Proof about the diagonal element of the hat matrix

leveragemultiple regressionregression

I'd really appreciate it if you could help me find the proof for the following formula:

$$h_{ii}=1/n + \frac{(x_{i}-\bar{x})^2}{\sum(x_{j}-\bar{x})^2},$$

where $j=1,\ldots,n$.

I don't really know where to start or what to do, so any help would be very much appreciated. Thanks!

Best Answer

First note that this formula applies just to simple linear regression where you're modeling $y_i = \beta_0 + \beta_1 x_i + \varepsilon_i$.

$\newcommand{\1}{\mathbf 1}$We can represent our regression as $y = X\beta + \varepsilon$ with $X = (\1 \mid x)$ where $x \in \mathbb R^n$ is the non-intercept univariate predictor; by assumption $X$ is full rank and this is equivalent to $x$ not being constant. This means $$ H = X(X^TX)^{-1}X^T = (\1 \mid x)\left(\begin{array}{cc}n & x^T\1 \\ x^T\1 & x^Tx\end{array}\right)^{-1}{\1^T\choose x^T}. $$ We can use the formula for the explicit inverse of a $2\times 2$ matrix to find $$ (X^TX)^{-1} = \frac{1}{nx^Tx - (x^T\1)^2}\left(\begin{array}{cc}x^Tx & -x^T\1 \\ -x^T\1 & n\end{array}\right) $$ so all together we can do the multiplication to get $$ H = \frac{1}{n x^Tx - (\1^T x)^2}\left(x^Tx\cdot \1\1^T - x^T\1 \cdot (\1 x^T + x \1^T) + n xx^T\right). $$ This means $$ h_i = \frac{x^Tx - 2x^T\1\cdot x_i + nx_i^2}{n x^Tx - (\1^T x)^2}. $$ For the numerator, I can use the fact that $\1^Tx = n \bar x$ to rewrite it as $$ x^Tx - 2nx_i\bar x + n x_i^2 = x^Tx + n(x_i^2 - 2 x_i\bar x + \bar x^2 - \bar x^2) \\ = x^Tx - n\bar x^2 + n(x_i - \bar x)^2. $$ Can you finish from here?

(later update) For the sake of completeness I'll finish the proof now.

$(\1^T x)^2 = n^2(\1^T x / n)^2 = n^2{\bar x}^2$ so $$ h_i = \frac{x^Tx - n\bar x^2 + n(x_i - \bar x)^2}{n x^Tx - (\1^T x)^2} \\ = \frac{x^Tx - n\bar x^2 + n(x_i - \bar x)^2}{n x^Tx - n^2{\bar x}^2} \\ = \frac 1n + \frac{(x_i - \bar x)^2}{x^Tx - n{\bar x}^2} $$ and then it's well known that $$ x^Tx - n{\bar x}^2 = \sum_{i}(x_i - \bar x)^2 $$ so we're done.

Best Answer

Related Solutions

Solved – Proof of link between the OLS slope estimate and two sample t test statistic (categorical Xvar)

Solved – Leverage formula/derivation and Hat matrix

Related Question