[Math] In linear regression, why is the hat matrix idempotent, symmetric, and p.s.d.

linear algebralinear regressionregressionstatistics

In linear regression,
$$y = X \beta + \epsilon$$
where $y$ is a $n \times 1$ vector of observations for the response variable,

$X = (x_{1}^{T}, …, x_{n}^{T}), x_{i} \in \mathbb{R}^p. i =1,…,n$ is a data matrix of $p$ explanatory variables, and $\epsilon$ is a vector of errors.

Further, assume that $\mathbb{E}[\epsilon_i] = 0$ and $var(\epsilon_i) = \sigma^2, i=1,…n$

The least-squares estimate,
$$\hat{\beta} = (X^{T}X)^{-1}X^{T}y$$

The least-squares estimators are the fitted values,
$$\hat{y} = X \hat{\beta} = X(X^{T}X)^{-1}X^{T}y = X C^{-1}X^{T}y = Py$$

$P$ is a projection matrix. It is has the following properties:

  1. idempotent, meaning P*P = P
  2. symmetric
  3. positive semi-definite

For property 1, what's the intuition behind this? How can you take some matrix do transformation, inverse and multiplication, then, you get idempotent. It's an important concept. But, it's hard to follow through the math to get an intuition.

Why will we get property 2 and property 3, How am I supposed to think about this?

Best Answer

I believe you’re asking for the intuition behind those three properties of the hat matrix, so I’ll try to rely on intuition alone and use as little math and higher level linear algebra concepts as possible.

Preliminaries

Start with the fact that the projection matrix $P$ allows you to obtain the orthogonal projection of an arbitrary vector onto the column space of X. Let’s use $v_p$ for the orthogonal projection of $v$: $$ P v = v_p $$ You can use $P$ to decompose any vector $v$ into two components that are orthogonal to each other. Think of $v_n$ as what is "left over" after the rest of $v$ is projected onto the column space of X, so it is orthogonal to the column space of X (and any vector in the column space of X). $$ v = v_p + v_n $$ $$ v_p \perp v_n $$

1. Why does P * P = P?

Intuitively, projecting a vector onto a subspace twice in a row has the same effect as projecting it onto that subspace once. The second projection has no effect because the vector is already in the subspace from the first projection.

Less intuitive

If that isn’t intuitive, it may be easier to consider the equivalent question: why does $P * P v= P v$ for any arbitrary vector v?

Start by simplifying the left hand side: $$ P * (P v) = P v_p $$ since $P v = v_p$.

Next consider $ P v_p $, which (by definition of P) projects $v_p$ onto the column space of X. This has no effect since $v_p$ is already entirely in the column space of X. Therefore
$$ P v_p = v_p $$ Since $v_p = P v$, we conclude: $$ P v_p = P v $$ Chaining all these equations together gives: $$ P * P v= P v $$

2. Why is P symmetric?

Intuitively, consider two arbitrary vectors $v$ and $w$. Take the dot product of one vector with the projection of the other vector. $$ (P v) \cdot w $$ $$ v \cdot (P w) $$

In both dot products, one term ($P v$ or $P w$) lies entirely in the ‘projected space’ (column space of X), so both dot products ignore everything that is not in the column space of X. This means both dot products are equal. Some simple dot product identities then imply that $P = P^T$, so $P$ is symmetric.

Less intuitive

If that isn't intuitive, we first prove that both dot products are equal. Decompose $v$ and $w$ as shown in the preliminaries above. $$ v = v_p + v_n $$ $$ w = w_p + w_n $$ The projection of a vector lies in a subspace. The dot product of anything in this subspace with anything orthogonal to this subspace is zero. We use this fact on the dot product of one vector with the projection of the other vector: $$ (P v) \cdot w \hspace{1cm} v \cdot (P w) $$ $$ v_p \cdot w \hspace{1cm} v \cdot w_p $$ $$ v_p \cdot (w_p + w_n) \hspace{1cm} (v_p + v_n) \cdot w_p $$ $$ v_p \cdot w_p + v_p \cdot w_n \hspace{1cm} v_p \cdot w_p + v_n \cdot w_p $$ $$ v_p \cdot w_p \hspace{1cm} v_p \cdot w_p $$ Therefore $$ (Pv) \cdot w = v \cdot (Pw) $$ Next, we can show that a consequence of this equality is that the projection matrix P must be symmetric. Here we begin by expressing the dot product in terms of transposes and matrix multiplication (using the identity $x \cdot y = x^T y$ ): $$ (P v) \cdot w = v \cdot (P w) $$ $$ (P v)^T w = v^T (P w) $$ $$ v^T P^T w = v^T P w $$ Since v and w can be any vectors, the above equality implies: $$ P^T = P $$

3. Why is P positive semidefinite?

By definition a matrix $P$ is positive semidefinite if and only if for every non-zero column vector $v$: $$ v^T P v >= 0 $$ or equivalently: $$ v \cdot (P v) >= 0 $$ Intuitively, a dot product is a projection of one vector onto another vector, and then scaling by the length of the second vector. We want to show that this dot product is non-negative. In the equation immediately above, $v \cdot (P v)$ means "project $v$ onto $P v$ and scale by $P v$". The first part of this, project $v$ onto $P v$, is equivalent to "project $v$ onto $v_p$", since $P v = v_p $.

Projecting $v$ onto $v_p$ projects $v$ onto something that lies entirely in the column space of X, so this projection is just $v_p$. Next, scaling this $v_p$ by $v_p$ squares its length. A squared length must be non-negative.

Less intuitive

If that isn't intuitive, the dot product can be simplified by decomposing $v$ into orthogonal components $$ v \cdot (P v) $$ $$ (v_p + v_n) \cdot (P v) $$ $$ (v_p + v_n) \cdot v_p $$ $$ v_p \cdot v_p + v_n \cdot v_p $$ Since $v_p$ and $v_n$ are orthogonal, the second term is zero and we have only $$ v_p \cdot v_p $$ The quantity immediately above is the length of the vector $v_p$ squared (i.e., $\|v_p\|_2^2$ ). This must be a non-negative value. $$ v_p \cdot v_p = \|v_p\|_2^2 >= 0 $$

Related Question