The maximum mean discrepancy (MMD) with a Gaussian RBF kernel. This is a cool distance, not yet super-well-known among the statistics community, that takes a bit of math to define.
Letting $$k(x, y) := \exp\left( - \frac{1}{2 \sigma^2} \lVert x - y \rVert^2 \right),$$
define the Hilbert space $\mathcal{H}$ as the reproducing kernel Hilbert space corresponding to $k$: $k(x, y) = \langle \varphi(x), \varphi(y) \rangle_{\mathcal H}$.
Define the mean map kernel as
$$
K(P, Q)
= \E_{X \sim P, Y \sim Q} k(X, Y)
= \langle \E_{X \sim P} \varphi(X), \E_{Y \sim Q} \varphi(Y) \rangle
.$$
The MMD is then
\begin{align}
\MMD(P, Q)
&= \lVert \E_{X \sim P}[\varphi(X)] - \E_{Y \sim Q}[\varphi(Y)] \rVert
\\&= \sqrt{K(P, P) + K(Q, Q) - 2 K(P, Q)}
\\&= \sup_{f : \lVert f \rVert_{\mathcal H} \le 1} \E_{X \sim P} f(X) - \E_{Y \sim Q} f(Y)
.\end{align}
For our mixtures $P$ and $Q$,
note that
$$
K(P, Q) = \sum_{i, j} \alpha_i \beta_j K(P_i, Q_j)
$$
and similarly for $K(P, P)$ and $K(Q, Q)$.
It turns out, using similar tricks as for $L_2$, that $K(\N(\mu, \Sigma), \N(\mu', \Sigma'))$ is
$$
(2 \pi \sigma^2)^{d/2} \N(\mu; \mu', \Sigma + \Sigma' + \sigma^2 I)
.$$
As $\sigma \to 0$, clearly this converges to a multiple of the $L_2$ distance. You'd normally want to use a different $\sigma$, though, one on the scale of the data variation.
Closed forms are also available for polynomial kernels $k$ in the MMD; see
Muandet, Fukumizu, Dinuzzo, and Schölkopf (2012). Learning from Distributions via Support Measure Machines. In Advances in Neural Information Processing Systems (official version). arXiv:1202.6504.
For a lot of nice properties of this distance, see
Sriperumbudur, Gretton, Fukumizu, Schölkopf, and Lanckriet (2010). Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research, 11, 1517–1561. arXiv:0907.5309.
Best Answer
If you are using R, this is a really nice page (http://www.statmethods.net/advstats/cluster.html) that steps though a few different methods to help in identifying the optimal number of clusters. HTH.