Linear-Algebra – Why Are Symmetric Positive Definite (SPD) Matrices So Important? Detailed Examination

covariance-matrixintuitionlinear algebramathematical-statisticsoptimization

I know the definition of symmetric positive definite (SPD) matrix, but want to understand more.

Why are they so important, intuitively?

Here is what I know. What else?

  • For a given data, Co-variance matrix is SPD. Co-variance matrix is a important metric, see this excellent post for intuitive explanation.

  • The quadratic form $\frac 1 2 x^\top Ax-b^\top x +c$ is convex, if $A$ is SPD. Convexity is a nice property for a function that can make sure the local solution is global solution. For Convex problems, there are many good algorithms to solve, but not for non-covex problems.

  • When $A$ is SPD, the optimization solution for the quadratic form $$\text{minimize}~~~ \frac 1 2 x^\top Ax-b^\top x +c$$ and the solution for linear system $$Ax=b$$ are the same. So we can run conversions between two classical problems. This is important because it enables us to use tricks discovered in one domain in the another. For example, we can use the conjugate gradient method to solve a linear system.

  • There are many good algorithms (fast, numerical stable) that work better for an SPD matrix, such as Cholesky decomposition.

EDIT: I am not trying ask the identities for SPD matrix, but the intuition behind the property to show the importance. For example, as mentioned by @Matthew Drury, if a matrix is SPD, Eigenvalues are all positive real numbers, but why all positive matters. @Matthew Drury had a great answer to flow and that is what I was looking for.

Best Answer

A (real) symmetric matrix has a complete set of orthogonal eigenvectors for which the corresponding eigenvalues are are all real numbers. For non-symmetric matrices this can fail. For example, a rotation in two dimensional space has no eigenvector or eigenvalues in the real numbers, you must pass to a vector space over the complex numbers to find them.

If the matrix is additionally positive definite, then these eigenvalues are all positive real numbers. This fact is much easier than the first, for if $v$ is an eigenvector with unit length, and $\lambda$ the corresponding eigenvalue, then

$$ \lambda = \lambda v^t v = v^t A v > 0 $$

where the last equality uses the definition of positive definiteness.

The importance here for intuition is that the eigenvectors and eigenvalues of a linear transformation describe the coordinate system in which the transformation is most easily understood. A linear transformation can be very difficult to understand in a "natural" basis like the standard coordinate system, but each comes with a "preferred" basis of eigenvectors in which the transformation acts as a scaling in all directions. This makes the geometry of the transformation much easier to understand.

For example, the second derivative test for the local extrema of a function $R^2 \rightarrow R$ is often given as a series of mysterious conditions involving an entry in the second derivative matrix and some determinants. In fact, these conditions simply encode the following geometric observation:

  • If the matrix of second derivatives is positive definite, you're at a local minimum.
  • If the matrix of second derivatives is negative definite, you're at a local maximum.
  • Otherwise, you are at neither, a saddle point.

You can understand this with the geometric reasoning above in an eigenbasis. The first derivative at a critical point vanishes, so the rates of change of the function here are controlled by the second derivative. Now we can reason geometrically

  • In the first case there are two eigen-directions, and if you move along either the function increases.
  • In the second, two eigen-directions, and if you move in either the function decreases.
  • In the last, there are two eigen-directions, but in one of them the function increases, and in the other it decreases.

Since the eigenvectors span the whole space, any other direction is a linear combination of eigen-directions, so the rates of change in those directions are linear combinations of the rates of change in the eigen directions. So in fact, this holds in all directions (this is more or less what it means for a function defined on a higher dimensional space to be differentiable). Now if you draw a little picture in your head, this makes a lot of sense out of something that is quite mysterious in beginner calculus texts.

This applies directly to one of your bullet points

The quadratic form $\frac 1 2 x^\top Ax-b^\top x +c$ is convex, if $A$ is SPD. Convex is a nice property that can make sure the local solution is global solution

The matrix of second derivatives is $A$ everywhere, which is symmetric positive definite. Geometrically, this means that if we move away in any eigen-direction (and hence any direction, because any other is a linear combination of eigen-directions) the function itself will bend away above it's tangent plane. This means the whole surface is convex.

Related Question