Solved – Do the Determinants of Covariance and Correlation Matrices and/or Their Inverses Have Useful Interpretations

correlation matrixcovariancecovariance-matrixdeterminantself-study

While learning to calculate covariance and correlation matrices and their inverses in VB and T-SQL a few years ago, I learned that the various entries have interesting properties that can make them useful in the right data mining scenarios. One obvious example is the presence of variances on the diagonals of covariance matrices; some less obvious examples that I have yet to use, but could come in handy at some point, are the variance inflation factors in inverse correlation matrices and partial correlations in inverse covariance matrices.

One thing I have yet to see directly addressed in the literature, however, is how to interpret the determinants of these matrices. Since determinants are frequently calculated for other types of matrices, I expected to find a slew of information on them, but I've turned up very little in casual searches of both the StackExchange forums and the rest of the Internet. Most of the mentions I have encountered revolve around using the determinants as a single step in the process of calculating other statistical tests and algorithms, such as Principle Components Analysis (PCA) and one of Hotelling’s tests; none directly addresses how to interpret these determinants, on their own. Is there a practical reason why they're not discussed often in the literature on data mining? More importantly, do they provide any useful information in a stand-alone fashion and if so, how could I interpret the determinants of each? I realize that determinants are a type of signed volume induced by a linear transformation, so I suspect that the determinants of these particular determinants might signify some kind of volumetric measure of covariance or correlation etc. over an entire set, or something to that effect (as opposed to ordinary covariance and correlation, which are between two attributes or variables). That also begs the question of what kind of volume their inverses would represent. I'm not familiar enough with the topic or the heavy matrix math involved to speculate further, but I am capable of coding all four types of matrices and their determinants. My question is not pressing, but in the long run I will have to make decisions on whether or not it's worthwhile to regularly include these matrices and their determinants in my exploratory data mining processes. It's cheaper to just calculate the covariance and correlation in a one-on-one, bivariate manner in these particular languages, but I'll go the extra mile and implement determinant calculations if I can derive some deeper insights that justify the expense in terms of programming resources. Thanks in advance.

Best Answer

I was able to cobble together some general principles, use cases and properties of these matrices from a desultory set of sources; few of them address these topics directly, with most merely mentioned in passing. Since determinants represent signed volumes, I expected those pertaining to these four types of matrices would translate into multidimensional association measures of some sort; this turned out to be true to some extent, but a few of them exhibit interesting properties:

Covariance Matrices:

• In the case of a Gaussian distribution, the determinant indirectly measures differential entropy, which can be construed as dispersion of the data points across the volume of the matrix. See tmp's answer at What does Determinant of Covariance Matrix give? for details.

• Alexander Vigodner's answer in the same thread says it also possesses the property of positivity.

• The covariance matrix determinant can be interpreted as generalized variance. See the NIST Statistics Handbook page 6.5.3.2. Determinant and Eigenstructure.

Inverse Covariance Matrices:

• It's equivalent to the inverse of the generalized variance that the covariance matrix determinant represents; maximizing the determinant of the inverse covariance matrix can apparently be used as a substitute for calculating the determinant of the Fisher information matrix, which can be used in optimizing experiment design. See kjetil b halvorsen's answer to the CV thread Determinant of Fisher Information

Correlation Matrices:

• These are much more interesting than covariance matrix determinants, in that the correlation volume decreases as the determinant approaches 1 and increases as the latter approaches 0. This is the opposite of ordinary correlation coefficients, in which higher numbers indicate greater positive correlation. "The determinant of the correlation matrix will equal 1.0 only if all correlations equal 0, otherwise the determinant will be less than 1. Remember that the determinant is related to the volume of the space occupied by the swarm of data points represented by standard scores on the measures involved. When the measures are uncorrelated, this space is a sphere with a volume of 1. When the measures are correlated, the space occupied becomes an ellipsoid whose volume is less than 1." See this set of Tulane course notes and this Quora page.

• Another citation for this unexpected behavior: "The determinant of a correlation matrix becomes zero or near zero when some of the variables are perfectly correlated or highly correlated with each other." See Rakesh Pandey's question How to handle the problem of near zero determinant in computing reliability using SPSS?

• A third reference: "Having a very small det(R) only means that you have some variables that are almost linearly dependent." Carlos Massera Filho's answer at this CrossValidated thread.

• The determinants also follow a scale from 0 to 1, which differ both from the -1 to 1 scale that correlation coefficients follow. They also lack the sign that an ordinary determinant may exhibit in expressing the orientation of a volume. Whether or not the correlation determinant still represents some notion of directionality was not addressed in any of the literature I found though.

Inverse Correlation Matrices:

• A Google search for the combined terms "inverse correlation matrix" and "determinant" turned up only 50 hit, so apparently they're not commonly applied to statistical reasoning.

• Apparently minimization of the inverse correlation determinant can be useful in some situations, given that a patent exists for echo cancellation using adaptive filters contains a regularization procedure designed to do just that. See p. 5 in this patent document.

• p. 5 of Robust Technology with Analysis of Interference in Signal Processing (available on Google Books previews) by Telman Aliev seems to suggest that "poor stipulation" of a correlation matrix is related to instability in the determinant of the inverse correlation matrices. In other words, wild changes in its determinant in proportion to small changes in its constituent elements are related to how much information is captured by the correlation matrices.

There may be other properties and use cases of these determinants not listed here; I'll just post these for the sake of completeness and to provide an answer to the question I posed, in case someone else runs into practical uses for these interpretations (as I have with correlation determinants).

Related Solutions

Solved – Hyperprior distributions for the parameters (scale matrix and degrees of freedom) of a wishart prior to an inverse covariance matrix

R's DPpackage allows a hierarchy that goes as far as you are suggesting on the scale matrix in the function DPdensity. You can peek at what they do in their manual or in the associated vignette to get some ideas. Let $\Sigma$ be the covariance matrix. It sets $\Sigma \sim IW(\nu_1, \Psi_1)$ and $\Psi_1 \sim IW(\nu_2, \Psi_2)$ where $IW(\nu, \Psi)$ is inverse-Wishart with degrees of freedom $\nu$ and mean $\frac{\Psi^{-1}}{\nu - p - 1}$ where $p$ is the dimension of the data. This looked a little backwards to me at first, but if you play with the density you can see it is conjugate. The Wishart density doesn't look promising for putting anything analytical on $\nu$. You could always put just about anything on $\nu$ and use a Metropolis-Hastings step.

EDIT: I just noticed you are using jags. There's a good chance I think that it will puke if you try to put any prior on $\Psi_1$, even though inverse-Wishart is conjugate. BUGS implementations can be fickle about what they allow for their multivariate distributions, so it might not know how to do the conjugate update. I don't know for sure though.

Solved – Metrics for covariance matrices: drawbacks and strengths

Well, I don't think there is a good metric or 'the best way' to analyze Covariance matrices. The analysis should be always aligned to your goal. Let's say C is my covariance matrix. The diagonal contains the variance for each computed parameter. So if you're interested in parameter significance then trace(C) is a good start since it's your overall performance.

If you plot your parameter and their significance you can see something like this:

x1 =  1.0 ±  0.1 
x2 = 10.0 ±  5.0
x3 =  5.0 ± 15.0 <-- non-significant parameter

If you're interested in their mutual correlation then such a table might yield something interesting:

x1  1.0
x2  0.9  1.0
x3 -0.3 -0.1  1.0
    x1    x2   x3

Each element is the correlation coefficient between the parameter xi and xj. From the example it's visible that parameter x1 and x2 are highly correlated.

Best Answer

Related Solutions

Solved – Hyperprior distributions for the parameters (scale matrix and degrees of freedom) of a wishart prior to an inverse covariance matrix

Solved – Metrics for covariance matrices: drawbacks and strengths

Related Question