Determinant of diag(covariance) >= determinant of covariance

covariancematricesprobability distributions

I was playing with generating 2D data from a Gaussian distribution with a covariance matrix that has non-zero off-diagonal elements (i.e., a generic covariance matrix).

I noticed that, whatever covariance matrix I use for generating data, I get

$$\det(\text{diag}(\hat{\Sigma})) > \det{(\hat{\Sigma})}$$

where $\hat{\Sigma}$ is the empirical covariance matrix.

Here's a short code example in Python, in case it's helpful:

import numpy as np

data = np.random.multivariate_normal(mean=[0, 0, 0], cov=np.array([[2, 0.9, 0.3], 
                                                                   [0.9, 4, 0.4], 
                                                                   [0.3, 0.4, 3]]), size=10000)

Σ_full = 1 / len(data) * data.T @ data
Σ_diag = Σ_full * np.eye(3)

print(np.linalg.det(Σ_diag) - np.linalg.det(Σ_full))  # <- this is positive

Since $\hat{\Sigma} \approx \Sigma$, on hindsight, my question is really: given a positive definite matrix, is it true that the determinant of its diagonal version is greater than or equal to the determinant of itself?


I'm developing an intuition of this phenomenon as follows.

We know that the determinant of $\text{diag}{(\Sigma)}$ is just the product of its diagonal entries (i.e., the variances), and the determinant of $\Sigma$ is just the determinant of its eigenvalues.
This leads to a geometric argument that, if I can argue geometrically that $\Sigma$ scales a unit space less than $\text{diag}(\Sigma)$, then the claim is obviously true.

A thought along this approach is that the column vectors of $\text{diag}{(\Sigma)}$ are orthogonal, yet those of $\Sigma$ are not. Therefore, if their column vectors were all unit length, then clearly $\Sigma$ scales a unit space less than $\text{diag}(\Sigma)$ because the column vectors of $\Sigma$ are pointing closer together (all in the positive quadrant) in space and constructs a "flatter" parallelepiped. But the problem is that their column vectors are not of unit length.

Best Answer

This result is known as Hadamard's inequality. First, notice that any $n\times n$ covariance matrix $\Sigma$ is positive semi-definite. If $\Sigma$ is singular, the inequality holds trivially. So, we may assume that $\Sigma$ is positive definite. Let $A=\operatorname{diag}(\Sigma)^{-1/2}$. Then $\operatorname{det}(A\Sigma A)\le 1$ iff $\operatorname{det}(\Sigma)\le \prod_{i=1}^n\Sigma_{ii}$ and, therefore, we may assume that $\Sigma_{11}=\cdots=\Sigma_{nn}=1$. Finally, $$ \operatorname{det}(\Sigma)=\prod_{i=1}^n \lambda_i\le \left(\frac{1}{n}\sum_{i=1}^n\lambda_i\right)^n=\left(\frac{1}{n}\operatorname{tr}(\Sigma)\right)^n=1 $$ by the AM-GM inequality, where $\lambda_1,\ldots,\lambda_n$ are the eigenvalues of $\Sigma$.

Related Question