Solved – Probabilistic comparison of two mixture models

gaussian mixture distributionhypothesis testingnormal distributiont-test

Given two gaussian mixture models (GMMs) with different degrees of freedom, is there a way to determine the probability that one is generated from the other? That is, can we give a probability to the hypothesis that the two distributions are actually the same?

My current approach is to use a correlation score, which is easy to calculate but is not really a probability:

$$
C(p_1,p_2) = -\log \left[ \frac{\int p_1(x)p_2(x)dx}{\int p_1^2(x)+p_2^2(x)dx}\right]
$$
For two GMMs $p_1(\boldsymbol r)=\sum_i\pi_i\phi_i(\boldsymbol r|\boldsymbol \mu_i,\boldsymbol \Sigma_i)$ and $p_2(\boldsymbol r)=\sum_j\pi_j\phi_j(\boldsymbol r|\boldsymbol \mu_j,\boldsymbol \Sigma_j)$ we have:
$$
\int p_1p_2 = \sum_{i,j} \pi_i\pi_j\int\phi_i(\boldsymbol r)\phi_j(\boldsymbol r)d\boldsymbol r
$$
The final integral is straightforward to compute because the product of two gaussians is another gaussian. I like this function because it's analytic, but it really doesn't tell us a probability. One additional thought is to use a multivariate t-test but this seems to only be useful for comparing two normal distributions. I need to compare two mixtures with unequal degrees of freedom.

Best Answer

You're comparing two distributions, not two random variables, so I'm not sure where "probability" would come into play. The closest thing to what you describe is the Kullback-Leibler divergence, which measures the amount of extra information one would need to encode with one distribution a sample produced by the other.