Orthogonal projection using orthogonal matrices: $QQ^T\neq Q^TQ$

linear algebraorthogonal matricesorthogonality

Here I define my "orthogonal" matrix $Q$ as having orthonormal columns, but not necessarily orthonormal rows, and $Q$ need not be square – this definition clashes with standard ones a bit, but it is consistent with the video I was watching, involving a proof of the least square method using QR decomposition.

$Q^TQ=I$ always, as the product amounts to the dot products of the columns with the columns, which will produce only $1$ or $0$ due to the orthonormality of $Q$, but $QQ^T$ is the rows dot the rows, so not necessarily the identity matrix, and particularly as $Q$ is not necessarily square I do not know how to make sense of the following:

Suppose $x$ is some vector in $\mathbb{R}^m$, where $Q\in\mathbb{M}_{m\times n}$, and $x$ is not in the column space of $Q$. Apparently, the orthogonal projection of $x$ onto the column space of $Q$, $\hat{x}$, is $QQ^Tx$!

I have tried to expand $(x-\hat{x})\cdot\hat{x}=0$ many times, many ways, all to no avail. The computation of $x\cdot (QQ^Tx)$ and $(QQ^Tx)\cdot (QQ^Tx)$ is a nightmare, and I have not shown them to be equivalent – I don't know how.

Does anyone know how to A) prove this and B) give an intuition for this?

When $Q$ is square and standardly orthogonal, i.e. the rows also form an orthonormal set, then $Q$ and $Q^T$ are just rotations that invert each other and it is all intuitive… but this $QQ^T$ in a non-square, not completely orthogonal setting makes no sense to me.

Best Answer

The orthogonal projection $P_V$ to a subspace $V$ is the unique linear transformation such that:

  1. for any $v\in V$ we have $P_V(v)=v$ and
  2. for any $u\in V^\perp$ we have $P_V(v)=0$.

Now, for $V$ the column space of $Q$:

  1. $v\in V$ means $v=Qw$ for some $w$. Then $QQ^Tv=QQ^TQw=Qw=v$
  2. $u\in V^\perp$ means $u\in \ker Q^T$. Then $QQ^Tu=Q0=0$.

Thus $P_V=QQ^T$.

For a more general matrix $A$ with linearly independent columns, projection onto its column space is given by $A(A^TA)^{-1}A^T$ for the same reason. This is a standard formula in "least squares".

The above reasoning only allows us to check the answer once we know it. To come up with it in the first place, we can reason as follows. Let column space of $Q$ be $V$. Then the rows of $Q^T$ are a set of relations that need to hold for a vector to be orthogonal to $V$. That is, the rows of $Q^T$ record all the things that need to vanish for a vector to be in $V^\perp$. So any linear map $P$ vanishes on $V^\perp$ precisely when it "factors through" $Q^T$, that is when it is a composition of something with $Q^T$, i.e. $P=MQ^T$.

Now, in order for $P$ to then be identity on $V$ itself, we must have $PQw=Qw$, so we must have $MQ^TQw=Qw$. But for us $Q^TQ=Id$. So $M=Q$ and so $P=QQ^T$.