I was able to cobble together some general principles, use cases and properties of these matrices from a desultory set of sources; few of them address these topics directly, with most merely mentioned in passing. Since determinants represent signed volumes, I expected those pertaining to these four types of matrices would translate into multidimensional association measures of some sort; this turned out to be true to some extent, but a few of them exhibit interesting properties:
Covariance Matrices:
• In the case of a Gaussian distribution, the determinant indirectly measures differential entropy, which can be construed as dispersion of the data points across the volume of the matrix. See tmp's answer at What does Determinant of Covariance Matrix give? for details.
• Alexander Vigodner's answer in the same thread says it also possesses the property of positivity.
• The covariance matrix determinant can be interpreted as generalized variance. See the NIST Statistics Handbook page 6.5.3.2. Determinant and Eigenstructure.
Inverse Covariance Matrices:
• It's equivalent to the inverse of the generalized variance that the covariance matrix determinant represents; maximizing the determinant of the inverse covariance matrix can apparently be used as a substitute for calculating the determinant of the Fisher information matrix, which can be used in optimizing experiment design. See kjetil b halvorsen's answer to the CV thread Determinant of Fisher Information
Correlation Matrices:
• These are much more interesting than covariance matrix determinants, in that the correlation volume decreases as the determinant approaches 1 and increases as the latter approaches 0. This is the opposite of ordinary correlation coefficients, in which higher numbers indicate greater positive correlation. "The determinant of the correlation matrix will equal 1.0 only if all correlations equal 0, otherwise the determinant will be less than 1. Remember that the determinant is related to the volume of the space occupied by the swarm of data points represented by standard scores on the measures involved. When the measures are uncorrelated, this space is a sphere with a volume of 1. When the measures are correlated, the space occupied becomes an ellipsoid whose volume is less than 1." See this set of Tulane course notes and this Quora page.
• Another citation for this unexpected behavior: "The determinant of a correlation matrix becomes zero or near zero when some of the variables are perfectly correlated or highly correlated with each other." See Rakesh Pandey's question How to handle the problem of near zero determinant in computing reliability using SPSS?
• A third reference: "Having a very small det(R) only means that you have some variables that are almost linearly dependent."
Carlos Massera Filho's answer at this CrossValidated thread.
• The determinants also follow a scale from 0 to 1, which differ both from the -1 to 1 scale that correlation coefficients follow. They also lack the sign that an ordinary determinant may exhibit in expressing the orientation of a volume. Whether or not the correlation determinant still represents some notion of directionality was not addressed in any of the literature I found though.
Inverse Correlation Matrices:
• A Google search for the combined terms "inverse correlation matrix" and "determinant" turned up only 50 hit, so apparently they're not commonly applied to statistical reasoning.
• Apparently minimization of the inverse correlation determinant can be useful in some situations, given that a patent exists for echo cancellation using adaptive filters contains a regularization procedure designed to do just that. See p. 5 in this patent document.
• p. 5 of Robust Technology with Analysis of Interference in Signal Processing (available on Google Books previews) by Telman Aliev seems to suggest that "poor stipulation" of a correlation matrix is related to instability in the determinant of the inverse correlation matrices. In other words, wild changes in its determinant in proportion to small changes in its constituent elements are related to how much information is captured by the correlation matrices.
There may be other properties and use cases of these determinants not listed here; I'll just post these for the sake of completeness and to provide an answer to the question I posed, in case someone else runs into practical uses for these interpretations (as I have with correlation determinants).
Best Answer
In the context of the Matrix Normal Distribution, the entries $X_{ij}$ are samples from the Gaussian distribution with mean $M_{ij}$. To fully characterise their variance though, we need two matrices $U$ and $V$ to set the joint distribution of the row and column vectors of the random matrix $X$. These two covariance matrices provide information about how the rows (via $U$) and columns (via $V$) covary with each other, thus capturing the dependencies within the matrix $X$. A higher value in the diagonal elements of $U$ implies greater variance within each row vector, while non-zero off-diagonal elements imply the covariance between different pairs of row vectors. (the analogy holds for $V$ but for columns) To that effect, if the row vectors were independent $U$ would be diagonal. (and similarly, if the column vectors were independent, $V$ would be diagonal)
To aid our understanding: let's start with a simple multivariate normal distribution, i.e. begin with just with a $k$-dimensional mean vector $\overrightarrow{\mu}$ and covariance matrix $\Sigma$ that is of dimensions $k \times k$. Then let's say we want to generate three rows of $k$ length in one go, i.e. our $X$ has dimensions $3 \times k$. In that case, $V = \Sigma$ and $U = I_{3 \times 3}$, $U$ controlling the covariance between the three rows and $V$ controlling the covariance between columns. Because we want just three independent rows, our covariance $U$ is an identity matrix.
Bringing this now altogether: The entry $X_{ij}$ is sampled independently from a Gaussian distribution with mean $M_{ij}$ and variances $U_{ii}$ (for the rows) and $V_{jj}$ (for the columns) and covariances $U_{ij}$ (across the rows) and $V_{ij}$ (across the columns).
Finally about the calculation of $U$ and $V$: They don't have a closed form so we cannot estimate them directly. That said, we can estimate the likelihood associated with any particular pair $U$ and $V$ so we can generate maximum likelihood estimates for them given a sample of matrix $X^1, X^2, \dots, X^n$ to find reasonable estimates. Both Python and R have packages estimating that likelihoood (for example
scipy.stats.matrix_normal
andmatrixNormal
respectively). I have also seen the paper: An expectation–maximization algorithm for the matrix normal distribution with an application in remote sensing by Glanz & Carvalho who use EM exactly based on that principle.