The maximum mean discrepancy (MMD) with a Gaussian RBF kernel. This is a cool distance, not yet super-well-known among the statistics community, that takes a bit of math to define.
Letting $$k(x, y) := \exp\left( - \frac{1}{2 \sigma^2} \lVert x - y \rVert^2 \right),$$
define the Hilbert space $\mathcal{H}$ as the reproducing kernel Hilbert space corresponding to $k$: $k(x, y) = \langle \varphi(x), \varphi(y) \rangle_{\mathcal H}$.
Define the mean map kernel as
$$
K(P, Q)
= \E_{X \sim P, Y \sim Q} k(X, Y)
= \langle \E_{X \sim P} \varphi(X), \E_{Y \sim Q} \varphi(Y) \rangle
.$$
The MMD is then
\begin{align}
\MMD(P, Q)
&= \lVert \E_{X \sim P}[\varphi(X)] - \E_{Y \sim Q}[\varphi(Y)] \rVert
\\&= \sqrt{K(P, P) + K(Q, Q) - 2 K(P, Q)}
\\&= \sup_{f : \lVert f \rVert_{\mathcal H} \le 1} \E_{X \sim P} f(X) - \E_{Y \sim Q} f(Y)
.\end{align}
For our mixtures $P$ and $Q$,
note that
$$
K(P, Q) = \sum_{i, j} \alpha_i \beta_j K(P_i, Q_j)
$$
and similarly for $K(P, P)$ and $K(Q, Q)$.
It turns out, using similar tricks as for $L_2$, that $K(\N(\mu, \Sigma), \N(\mu', \Sigma'))$ is
$$
(2 \pi \sigma^2)^{d/2} \N(\mu; \mu', \Sigma + \Sigma' + \sigma^2 I)
.$$
As $\sigma \to 0$, clearly this converges to a multiple of the $L_2$ distance. You'd normally want to use a different $\sigma$, though, one on the scale of the data variation.
Closed forms are also available for polynomial kernels $k$ in the MMD; see
Muandet, Fukumizu, Dinuzzo, and Schölkopf (2012). Learning from Distributions via Support Measure Machines. In Advances in Neural Information Processing Systems (official version). arXiv:1202.6504.
For a lot of nice properties of this distance, see
Sriperumbudur, Gretton, Fukumizu, Schölkopf, and Lanckriet (2010). Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research, 11, 1517–1561. arXiv:0907.5309.
Best Answer
You can use any of the norms $\| A-B \|_p $ (see Wikipedia on a variety of norms; note that the square-root of the sum of squared distances, $\sqrt{\sum_{i,j} (a_{ij}-b_{ij})^2}$, is called Frobenius norm, and is different from $L_2$ norm, which is the square root of the largest eigenvalue of $(A-B)^2$, although of course they would generate the same topology). The K-L distance between the two normal distributions with the same means (say zero) and the two specific covariance matrices is also available in Wikipedia as $\frac12 [ \mbox{tr} (A^{-1}B) - \mbox{ln}( |B|/|A| ) ]$.
Edit: if one of the matrices is a model-implied matrix, and the other is the sample covariance matrix, then of course you can form a likelihood ratio test between the two. My personal favorite collection of such tests for simple structures is given in Rencher (2002) Methods of Multivariate Analysis. More advanced cases are covered in covariance structure modeling, on which a reasonable starting point is Bollen (1989) Structural Equations with Latent Variables.