Covariance Matrices – Measures of Similarity or Distance

covariance-matrixdistributionshypothesis testinginformation theorykullback-leibler

Are there any measures of similarity or distance between two symmetric covariance matrices (both having the same dimensions)?

I am thinking here of analogues to KL divergence of two probability distributions or the Euclidean distance between vectors except applied to matrices. I imagine there would be quite a few similarity measurements.

Ideally I would also like to test the null hypothesis that two covariance matrices are identical.

Best Answer

You can use any of the norms $\| A-B \|_p $ (see Wikipedia on a variety of norms; note that the square-root of the sum of squared distances, $\sqrt{\sum_{i,j} (a_{ij}-b_{ij})^2}$, is called Frobenius norm, and is different from $L_2$ norm, which is the square root of the largest eigenvalue of $(A-B)^2$, although of course they would generate the same topology). The K-L distance between the two normal distributions with the same means (say zero) and the two specific covariance matrices is also available in Wikipedia as $\frac12 [ \mbox{tr} (A^{-1}B) - \mbox{ln}( |B|/|A| ) ]$.

Edit: if one of the matrices is a model-implied matrix, and the other is the sample covariance matrix, then of course you can form a likelihood ratio test between the two. My personal favorite collection of such tests for simple structures is given in Rencher (2002) Methods of Multivariate Analysis. More advanced cases are covered in covariance structure modeling, on which a reasonable starting point is Bollen (1989) Structural Equations with Latent Variables.

Related Solutions

Solved – Distance between two Gaussian mixtures to evaluate cluster solutions

Suppose we have two Gaussian mixtures in $\mathbb R^d$:$\DeclareMathOperator{\N}{\mathcal N} \newcommand{\ud}{\mathrm{d}} \DeclareMathOperator{\E}{\mathbb E} \DeclareMathOperator{\MMD}{\mathrm{MMD}}$ $$ P = \sum_{i=1}^{n} \alpha_i P_i = \sum_{i=1}^n \alpha_i \N(\mu_i, \Sigma_i) \qquad Q = \sum_{j=1}^m \beta_j Q_j = \sum_{j=1}^m \N(m_j, S_j) .$$ Call their densities $p(\cdot)$ and $q(\cdot)$, respectively, and denote the densities of their components $P_i$, $Q_j$ by $p_i(x) = \N(x; \mu_i, \Sigma_i)$, $q_j(x) = \N(x; m_j, S_j)$.

The following distances are available in closed form:

$L_2$ distance, as suggested in a comment by user39665. This is: \begin{align} L_2(P, Q)^2 &= \int (p(x) - q(x))^2 \,\ud x \\&= \int \left( \sum_{i} \alpha_i p_i(x) - \sum_j \beta_j q_j(x) \right)^2 \ud x \\&= \sum_{i,i'} \alpha_i \alpha_{i'} \int p_i(x) p_{i'}(x) \ud x + \sum_{j,j'} \beta_j \beta_{j'} \int q_j(x) q_{j'}(x) \ud x \\&\qquad - 2 \sum_{i,j} \alpha_i \beta_j \int p_i(x) q_j(x) \ud x .\end{align} Note that, as seen for example in section 8.1.8 of the matrix cookbook: \begin{align} \int \N(x; \mu, \Sigma) \N(x; \mu', \Sigma') \,\ud x &= \N(\mu; \mu', \Sigma + \Sigma') \end{align} so this can be evaluated easily in $O(m n)$ time.
The maximum mean discrepancy (MMD) with a Gaussian RBF kernel. This is a cool distance, not yet super-well-known among the statistics community, that takes a bit of math to define.

Letting $$k(x, y) := \exp\left( - \frac{1}{2 \sigma^2} \lVert x - y \rVert^2 \right),$$ define the Hilbert space $\mathcal{H}$ as the reproducing kernel Hilbert space corresponding to $k$: $k(x, y) = \langle \varphi(x), \varphi(y) \rangle_{\mathcal H}$.

Define the mean map kernel as $$ K(P, Q) = \E_{X \sim P, Y \sim Q} k(X, Y) = \langle \E_{X \sim P} \varphi(X), \E_{Y \sim Q} \varphi(Y) \rangle .$$

The MMD is then \begin{align} \MMD(P, Q) &= \lVert \E_{X \sim P}[\varphi(X)] - \E_{Y \sim Q}[\varphi(Y)] \rVert \\&= \sqrt{K(P, P) + K(Q, Q) - 2 K(P, Q)} \\&= \sup_{f : \lVert f \rVert_{\mathcal H} \le 1} \E_{X \sim P} f(X) - \E_{Y \sim Q} f(Y) .\end{align}

For our mixtures $P$ and $Q$, note that $$ K(P, Q) = \sum_{i, j} \alpha_i \beta_j K(P_i, Q_j) $$ and similarly for $K(P, P)$ and $K(Q, Q)$.

It turns out, using similar tricks as for $L_2$, that $K(\N(\mu, \Sigma), \N(\mu', \Sigma'))$ is $$ (2 \pi \sigma^2)^{d/2} \N(\mu; \mu', \Sigma + \Sigma' + \sigma^2 I) .$$

As $\sigma \to 0$, clearly this converges to a multiple of the $L_2$ distance. You'd normally want to use a different $\sigma$, though, one on the scale of the data variation.

Closed forms are also available for polynomial kernels $k$ in the MMD; see

Muandet, Fukumizu, Dinuzzo, and Schölkopf (2012). Learning from Distributions via Support Measure Machines. In Advances in Neural Information Processing Systems (official version). arXiv:1202.6504.

For a lot of nice properties of this distance, see

Sriperumbudur, Gretton, Fukumizu, Schölkopf, and Lanckriet (2010). Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research, 11, 1517–1561. arXiv:0907.5309.
Quadratic Jensen-Rényi divergence. The Rényi-$\alpha$ entropy is defined as $$ H_\alpha(p) = \frac{1}{1-\alpha} \log\left( \int p(x)^\alpha \,\ud x \right) .$$ Its limit as $\alpha \to 1$ is the Shannon entropy. The Jensen-Rényi divergence is $$ \mathrm{JR}_\alpha(p, q) = H_\alpha\left( \frac{p + q}{2} \right) - \frac{H_\alpha(p) + H_\alpha(q)}{2} $$ where $\frac{p + q}{2}$ denotes an equal mixture between $p$ and $q$. It turns out that, when $\alpha = 2$ and when $P$ and $Q$ are Gaussian mixtures (as here), you can compute a closed form for $\mathrm{JR}_2$. This was done by

Wang, Syeda-Mahmood, Vemuri, Beymer, and Rangarajan (2009). Closed-Form Jensen-Renyi Divergence for Mixture of Gaussians and Applications to Group-Wise Shape Registration. Med Image Comput Comput Assist Interv., 12(1), 648–655. (free pubmed version)

Solved – Distance measure between two multivariate normal distributions (with differing mean and covariances)

In the end I went for the Bhattacharyya distance. I adapted the R code referenced here:

// In the following, Vec3 and Mat3 are C++ Eigen types.

/// See: https://en.wikipedia.org/wiki/Mahalanobis_distance
double mahalanobis(const Vec3& dist, const Mat3& cov)
{
    return (dist.transpose()*cov.inverse()*dist).eval()(0);
}

/// See: https://en.wikipedia.org/wiki/Bhattacharyya_distance
double bhattacharyya(const Vec3& dist, const Mat3& cov1, const Mat3& cov2)
{
    const Mat3 cov = (cov1+cov2)/2;
    const double d1 = mahalanobis(dist, cov)/8;
    const double d2 = log(cov.determinant()/sqrt(cov1.determinant()*cov2.determinant()))/2;
    return d1+d2;
}

Best Answer

Related Solutions

Solved – Distance between two Gaussian mixtures to evaluate cluster solutions

Solved – Distance measure between two multivariate normal distributions (with differing mean and covariances)

Related Question