[Math] What does Determinant of Covariance Matrix give

covariancematrices

I am representing my 3d data using its sample covariance matrix. I want to know what the determinant of covariance Matrix represents. If the determinant is positive, zero, negative, high positive, high negative, what does it mean or represent?

Thanks

EDIT:

Covariance is being used to represent variance for 3d coordinates that I have. If my covariance matrix A determinant is +100, and the other covariance matrix B determinant is +5. Which of these values show if the variance is more or not. Which value tells that data points are more dispersed. Which value shows that readings are further away from mean.

Best Answer

I would like to point out that there is a connection between the determinant of the covariance matrix of (Gaussian distributed) data points and the differential entropy of the distribution.

To put it in other words: Let's say you have a (large) set of points from which you assume it is Gaussian distributed. If you compute the determinant of the sample covariance matrix then you measure (indirectly) the differential entropy of the distribution up to constant factors and a logarithm. See, e.g, Multivariate normal distribution.

The differential entropy of a Gaussian density is defined as:

$$H[p] = \frac{k}{2}(1 + \ln(2\pi)) + \frac{1}{2} \ln \vert \Sigma \vert\;,$$

where $k$ is the dimensionality of your space, i.e., in your case $k=3$.

$\Sigma$ is positive semi-definite, which means $\vert \Sigma \vert \geq 0$.

The larger $\vert \Sigma \vert$, the more are your data points dispersed. If $\vert \Sigma \vert = 0$, it means that your data ponts do not 'occupy the whole space', meaning that they lie, e.g., on a line or a plane within $\mathbb{R}^3$. Somewhere I have read, that $\vert \Sigma \vert$ is also called generalized variance. Alexander Vigodner is right, it captures the volume of your data cloud.

Since a sample covariance matrix is defined somewhat like: $$\Sigma = \frac{1}{N-1} \sum_{i=1}^N (\vec{x}_i - \vec{\mu})(\vec{x}_i - \vec{\mu})^T\; $$ it follows, that you do not capture any information about the mean. You can verify that easily by adding some large constant vectorial shift to your data; $\vert \Sigma \vert$ should not change.

I don't want to go to much into detail, but there is also a connection to PCA. Since the eigenvalues $\lambda_1, \lambda_2, \lambda_3$ of $\Sigma$ correspond to the variances along the principal component axis of your data points, $\vert \Sigma \vert$ captures their product, because by definition the determinant of a matrix is equal to the product of its eigenvalues.

Note that the largest eigenvalue corresponds to the maximal variance w.r.t. to your data (direction given by the corresponding eigenvector, see PCA).