The variance of residual of regression (matrix form)

linear regressionmatricesvariance

I am confused at the variance of residual in regression (matrix form)

Given $Y = X*\beta \hat{}+ \epsilon$

Y is a nx1 predicted vector

X is a nxp matrix

$\beta\hat{}$ is a px1 coefficient vector = $(X^TX)^{-1}X^TY$

$\epsilon$ is a nx1 residual vector

Then $var(\epsilon)=var(Y-X\beta\hat{}$) = var($Y$ – $X(X^TX)^{-1}X^TY$) (1)

Let projection matrix $P$ = $X(X^TX)^{-1}X^T$ . It is idempotent $(P^n=P)$

(1) becomes $var(Y – PY)$ (2)

Method 1: (2) = $var ((I-P)Y) = (I-P)Var(Y)(I-P)^T = \sigma^2(I-P)$

Method 2: (2) = $var(Y)+var(PY) = \sigma^2I + P\sigma^2P^T = \sigma^2(I+P)$

I expect that both methods should result in the same formula but they did not. I am not sure what is missed here

Best Answer

The second formula you have assumed independence (or at least zero covariance) of $Y$ and $PY$, that does not hold.

See here, you forgot two important terms.

$$var (Y - PY) = var (Y) + var(PY) + cov(Y, -PY) + cov(-PY, Y) $$

Preliminaries

Start with the fact that the projection matrix $P$ allows you to obtain the orthogonal projection of an arbitrary vector onto the column space of X. Let’s use $v_p$ for the orthogonal projection of $v$: $$ P v = v_p $$ You can use $P$ to decompose any vector $v$ into two components that are orthogonal to each other. Think of $v_n$ as what is "left over" after the rest of $v$ is projected onto the column space of X, so it is orthogonal to the column space of X (and any vector in the column space of X). $$ v = v_p + v_n $$ $$ v_p \perp v_n $$

1. Why does P * P = P?

Intuitively, projecting a vector onto a subspace twice in a row has the same effect as projecting it onto that subspace once. The second projection has no effect because the vector is already in the subspace from the first projection.

Less intuitive

If that isn’t intuitive, it may be easier to consider the equivalent question: why does $P * P v= P v$ for any arbitrary vector v?

Start by simplifying the left hand side: $$ P * (P v) = P v_p $$ since $P v = v_p$.

Next consider $ P v_p $, which (by definition of P) projects $v_p$ onto the column space of X. This has no effect since $v_p$ is already entirely in the column space of X. Therefore
$$ P v_p = v_p $$ Since $v_p = P v$, we conclude: $$ P v_p = P v $$ Chaining all these equations together gives: $$ P * P v= P v $$

2. Why is P symmetric?

Intuitively, consider two arbitrary vectors $v$ and $w$. Take the dot product of one vector with the projection of the other vector. $$ (P v) \cdot w $$ $$ v \cdot (P w) $$

In both dot products, one term ($P v$ or $P w$) lies entirely in the ‘projected space’ (column space of X), so both dot products ignore everything that is not in the column space of X. This means both dot products are equal. Some simple dot product identities then imply that $P = P^T$, so $P$ is symmetric.

Less intuitive

If that isn't intuitive, we first prove that both dot products are equal. Decompose $v$ and $w$ as shown in the preliminaries above. $$ v = v_p + v_n $$ $$ w = w_p + w_n $$ The projection of a vector lies in a subspace. The dot product of anything in this subspace with anything orthogonal to this subspace is zero. We use this fact on the dot product of one vector with the projection of the other vector: $$ (P v) \cdot w \hspace{1cm} v \cdot (P w) $$ $$ v_p \cdot w \hspace{1cm} v \cdot w_p $$ $$ v_p \cdot (w_p + w_n) \hspace{1cm} (v_p + v_n) \cdot w_p $$ $$ v_p \cdot w_p + v_p \cdot w_n \hspace{1cm} v_p \cdot w_p + v_n \cdot w_p $$ $$ v_p \cdot w_p \hspace{1cm} v_p \cdot w_p $$ Therefore $$ (Pv) \cdot w = v \cdot (Pw) $$ Next, we can show that a consequence of this equality is that the projection matrix P must be symmetric. Here we begin by expressing the dot product in terms of transposes and matrix multiplication (using the identity $x \cdot y = x^T y$ ): $$ (P v) \cdot w = v \cdot (P w) $$ $$ (P v)^T w = v^T (P w) $$ $$ v^T P^T w = v^T P w $$ Since v and w can be any vectors, the above equality implies: $$ P^T = P $$

3. Why is P positive semidefinite?

By definition a matrix $P$ is positive semidefinite if and only if for every non-zero column vector $v$: $$ v^T P v >= 0 $$ or equivalently: $$ v \cdot (P v) >= 0 $$ Intuitively, a dot product is a projection of one vector onto another vector, and then scaling by the length of the second vector. We want to show that this dot product is non-negative. In the equation immediately above, $v \cdot (P v)$ means "project $v$ onto $P v$ and scale by $P v$". The first part of this, project $v$ onto $P v$, is equivalent to "project $v$ onto $v_p$", since $P v = v_p $.

Projecting $v$ onto $v_p$ projects $v$ onto something that lies entirely in the column space of X, so this projection is just $v_p$. Next, scaling this $v_p$ by $v_p$ squares its length. A squared length must be non-negative.

Less intuitive

If that isn't intuitive, the dot product can be simplified by decomposing $v$ into orthogonal components $$ v \cdot (P v) $$ $$ (v_p + v_n) \cdot (P v) $$ $$ (v_p + v_n) \cdot v_p $$ $$ v_p \cdot v_p + v_n \cdot v_p $$ Since $v_p$ and $v_n$ are orthogonal, the second term is zero and we have only $$ v_p \cdot v_p $$ The quantity immediately above is the length of the vector $v_p$ squared (i.e., $\|v_p\|_2^2$ ). This must be a non-negative value. $$ v_p \cdot v_p = \|v_p\|_2^2 >= 0 $$

Best Answer

Related Solutions

Variance – Variance of a Constant Matrix Times a Random Vector

[Math] In linear regression, why is the hat matrix idempotent, symmetric, and p.s.d.

Preliminaries

1. Why does P * P = P?

Less intuitive

2. Why is P symmetric?

Less intuitive

3. Why is P positive semidefinite?

Less intuitive

Related Question