Solved – Appropriate measure to find smallest covariance matrix

covariance-matrixgeometryintuitionlinear algebramatrix

In the textbook I am reading they use positive definiteness (semi-positive definiteness) to compare two covariance matrices. The idea being that if $A-B$ is pd then $B$ is smaller than $A$. But I'm struggling to get the intuition of this relationship?

There is a similar thread here:

https://math.stackexchange.com/questions/239166/what-is-the-intuition-for-using-definiteness-to-compare-matrices

What is the intuition for using definiteness to compare matrices?

Although the answers are nice they don't really address the intuition.

Here is an example I find confusing:

\begin{equation}
\begin{bmatrix} 16 & 12 \\ 12 & 9 \end{bmatrix} – \begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix}
\end{equation}

now here the determinant of the difference is -25 so the relation is not pd or even psd and so the first matrix is not greater than the first?

I simply want to compare two 3*3 covariance matrices to see which is smallest? It would seem more intuitive to me to use the something like the euclidean norm to compare them? However this would mean that the first matrix above is greater than the second matix. Moreover I only ever see the pd/psd criterion used to compare covariance matrices.

Can someone explain why pd/psd is better than using another measure such as the euclidean norm?

I have also posted this question on the math forum (wasn't sure what was best) hope this does not contravene any rules.

https://math.stackexchange.com/questions/628135/comparing-two-covariance-matrices

Best Answer

The ordering of matrices you refer to is known as the Loewner order and is a partial order much used in the study of positive definite matrices. A book-length treatment of the geometry on the manifold of positive-definite (posdef) matrices is here.

I will first try to address your question about intuitions. A (symmetric) matrix $A$ is posdef if $c^T A c\ge 0$ for all $c \in \mathbb{R}^n$. If $X$ is a random variable (rv) with covariance matrix $A$, then $c^T X$ is (proportional to) its projection on some one-dim subspace, and $\mathbb{Var}(c^T X) = c^T A c$. Applying this to $A-B$ in your Q, first: it is a covariance matrix, second: A random variable with covar matrix $B$ projects in all directions with smaller variance than a rv with covariance matrix $A$. This makes it intuitively clear that this ordering can only be a partial one, there are many rv's which will project in different directions with wildly different variances. Your proposal of some Euclidean norm do not have such a natural statistical interpretation.

Your "confusing example" is confusing because both matrices have determinant zero. So for each one, there is one direction (the eigenvector with eigenvalue zero) where they always projects to zero. But this direction is different for the two matrices, therefore they cannot be compared.

The Loewner order is defined such that $A \preceq B$, $B$ is more positive definite than $A$, if $B-A$ is posdef. This is a partial order, for some posdef matrices neither $B-A$ nor $A-B$ is posdef. An example is: $$ A=\begin{pmatrix} 1 & 0.5 \\ 0.5 & 1 \end{pmatrix}, \quad B= \begin{pmatrix} 0.5 & 0\\ 0 & 1.5 \end{pmatrix} $$ One way of showing this graphically is drawing a plot with two ellipses, but centered at the origin, associated in a standard way with the matrices (then the radial distance in each direction is proportional to the variance of projecting in that direction):

Two posdef matrices shown as ellipses

In these case the two ellipses are congruent, but rotated differently (in fact the angle is 45 degrees). This corresponds to the fact that the matrices $A$ and $B$ have the same eigenvalues, but the eigenvectors are rotated.

As this answer depends much on properties of ellipses, the following What is the intuition behind conditional Gaussian distributions? explaining ellipses geometrically, can be helpful.

Now I will explain how the ellipses associated to the matrices are defined. A posdef matrix $A$ defines a quadratic form $Q_A(c) = c^T A c$. This can be plotted as a function, the graph will be a quadratic. If $A \preceq B$ then the graph of $Q_B$ will always be above the graph of $Q_A$. If we cut the graphs with a horizontal plane at height 1, then the cuts will describe ellipses (that is in fact a way of defining ellipses). This cut ellipses are the given by the equations $$ Q_A(c)=1, \quad Q_B(c)=1$$ and we see that $A \preceq B$ corresponds to the ellipse of B (now with interior) is contained within the ellipse of A. If there is no order, there will be no containment. We observe that the inclusion order is opposite to the Loewner partial order, if we dislike that we can draw ellipses of the inverses. This because $A \preceq B$ is equivalent to $B^{-1} \preceq A^{-1}$. But I will stay with the ellipses as defined here.

An ellipse can be describes with the semiaxes and their length. We will only discuss $2\times 2$-matrices here, as they are the ones we can draw ... So we need the two principal axes and their length. This can be found, as explained here with an eigendecomposition of the posdef matrix. Then the principal axes are given by the eigenvectors, and their length $a,b$ can be calculated from the eigenvalues $\lambda_1, \lambda_2$ by $$ a = \sqrt{1/\lambda_1}, \quad b=\sqrt{1/\lambda_2}. $$ We can also see that the area of the ellipse representing $A$ is $\pi a b= \pi \sqrt{1/\lambda_1}\sqrt{1/\lambda_2} = \frac{\pi}{\sqrt{\det A}}$.

I will give one final example where the matrices can be ordered:

Two matrices that can be ordered plotted as ellipses

The two matrices in this case were: $$A =\begin{pmatrix}2/3 & 1/5 \\ 1/5 & 3/4\end{pmatrix}, \quad B=\begin{pmatrix} 1& 1/7 \\ 1/7& 1 \end{pmatrix} $$

Related Question