I know the definition of symmetric positive definite (SPD) matrix, but want to understand more.
Why are they so important, intuitively?
Here is what I know. What else?
-
For a given data, Co-variance matrix is SPD. Co-variance matrix is a important metric, see this excellent post for intuitive explanation.
-
The quadratic form $\frac 1 2 x^\top Ax-b^\top x +c$ is convex, if $A$ is SPD. Convexity is a nice property for a function that can make sure the local solution is global solution. For Convex problems, there are many good algorithms to solve, but not for non-covex problems.
-
When $A$ is SPD, the optimization solution for the quadratic form $$\text{minimize}~~~ \frac 1 2 x^\top Ax-b^\top x +c$$ and the solution for linear system $$Ax=b$$ are the same. So we can run conversions between two classical problems. This is important because it enables us to use tricks discovered in one domain in the another. For example, we can use the conjugate gradient method to solve a linear system.
-
There are many good algorithms (fast, numerical stable) that work better for an SPD matrix, such as Cholesky decomposition.
EDIT: I am not trying ask the identities for SPD matrix, but the intuition behind the property to show the importance. For example, as mentioned by @Matthew Drury, if a matrix is SPD, Eigenvalues are all positive real numbers, but why all positive matters. @Matthew Drury had a great answer to flow and that is what I was looking for.
Best Answer
A (real) symmetric matrix has a complete set of orthogonal eigenvectors for which the corresponding eigenvalues are are all real numbers. For non-symmetric matrices this can fail. For example, a rotation in two dimensional space has no eigenvector or eigenvalues in the real numbers, you must pass to a vector space over the complex numbers to find them.
If the matrix is additionally positive definite, then these eigenvalues are all positive real numbers. This fact is much easier than the first, for if $v$ is an eigenvector with unit length, and $\lambda$ the corresponding eigenvalue, then
$$ \lambda = \lambda v^t v = v^t A v > 0 $$
where the last equality uses the definition of positive definiteness.
The importance here for intuition is that the eigenvectors and eigenvalues of a linear transformation describe the coordinate system in which the transformation is most easily understood. A linear transformation can be very difficult to understand in a "natural" basis like the standard coordinate system, but each comes with a "preferred" basis of eigenvectors in which the transformation acts as a scaling in all directions. This makes the geometry of the transformation much easier to understand.
For example, the second derivative test for the local extrema of a function $R^2 \rightarrow R$ is often given as a series of mysterious conditions involving an entry in the second derivative matrix and some determinants. In fact, these conditions simply encode the following geometric observation:
You can understand this with the geometric reasoning above in an eigenbasis. The first derivative at a critical point vanishes, so the rates of change of the function here are controlled by the second derivative. Now we can reason geometrically
Since the eigenvectors span the whole space, any other direction is a linear combination of eigen-directions, so the rates of change in those directions are linear combinations of the rates of change in the eigen directions. So in fact, this holds in all directions (this is more or less what it means for a function defined on a higher dimensional space to be differentiable). Now if you draw a little picture in your head, this makes a lot of sense out of something that is quite mysterious in beginner calculus texts.
This applies directly to one of your bullet points
The matrix of second derivatives is $A$ everywhere, which is symmetric positive definite. Geometrically, this means that if we move away in any eigen-direction (and hence any direction, because any other is a linear combination of eigen-directions) the function itself will bend away above it's tangent plane. This means the whole surface is convex.