How do we interpret the covariance matrices $\textbf{U}$ and $\textbf{V}$ in the Matrix Variate Normal Distribution

covariance-matrixinterpretationmatrixmultivariate analysis

Consider the Matrix Normal Distribution. My first question is: how do we interpret the entries $\textbf{X}_{ij}$ of the random matrix $\textbf{X}(n\times p)$? My second question is: how do we interpret the so called covariance matrices $\textbf{U}$ and $\textbf{V}$? Finally, my third question is: how do we calculate the matrices $\textbf{U}$ and $\textbf{V}$?

Any help would be appreciated.

Best Answer

In the context of the Matrix Normal Distribution, the entries $X_{ij}$ are samples from the Gaussian distribution with mean $M_{ij}$. To fully characterise their variance though, we need two matrices $U$ and $V$ to set the joint distribution of the row and column vectors of the random matrix $X$. These two covariance matrices provide information about how the rows (via $U$) and columns (via $V$) covary with each other, thus capturing the dependencies within the matrix $X$. A higher value in the diagonal elements of $U$ implies greater variance within each row vector, while non-zero off-diagonal elements imply the covariance between different pairs of row vectors. (the analogy holds for $V$ but for columns) To that effect, if the row vectors were independent $U$ would be diagonal. (and similarly, if the column vectors were independent, $V$ would be diagonal)

To aid our understanding: let's start with a simple multivariate normal distribution, i.e. begin with just with a $k$-dimensional mean vector $\overrightarrow{\mu}$ and covariance matrix $\Sigma$ that is of dimensions $k \times k$. Then let's say we want to generate three rows of $k$ length in one go, i.e. our $X$ has dimensions $3 \times k$. In that case, $V = \Sigma$ and $U = I_{3 \times 3}$, $U$ controlling the covariance between the three rows and $V$ controlling the covariance between columns. Because we want just three independent rows, our covariance $U$ is an identity matrix.

Bringing this now altogether: The entry $X_{ij}$ is sampled independently from a Gaussian distribution with mean $M_{ij}$ and variances $U_{ii}$​ (for the rows) and $V_{jj}$ (for the columns) and covariances $U_{ij}$​ (across the rows) and $V_{ij}$ (across the columns).

Finally about the calculation of $U$ and $V$: They don't have a closed form so we cannot estimate them directly. That said, we can estimate the likelihood associated with any particular pair $U$ and $V$ so we can generate maximum likelihood estimates for them given a sample of matrix $X^1, X^2, \dots, X^n$ to find reasonable estimates. Both Python and R have packages estimating that likelihoood (for example scipy.stats.matrix_normal and matrixNormal respectively). I have also seen the paper: An expectation–maximization algorithm for the matrix normal distribution with an application in remote sensing by Glanz & Carvalho who use EM exactly based on that principle.

Related Question