Solved – Is the posterior distribution on means in a Bayesian Gaussian mixture model with symmetric priors Gaussian

bayesiangaussian mixture distributioninference

I am reading through a document on learning Gaussian mixture models in Infer.NET. They assume the data is generated from 2 Gaussians where the prior distribution on means is Gaussian and the prior distribution on precisions is a Whishart distribution. The prior distribution on the mixture is a Dirichlet distribution. All of these priors are symmetric in the two Gaussians.

They do some inference on some data, and they get back that the posterior distribution on each of the two means is the same Gaussian. They then go on to talk about how to break the symmetry in the model so that the means can converge to different Gaussians.

How can it possibly be that the posteriors on the means are Gaussian? If I observe a million samples from a Gaussian Mixture Model (say unbeknownst to me the data is created by choosing with equal probability a normal distribution of mean 0 and variance 1 or a normal distribution with mean 100 and variance 1) it should be ABSOLUTELY CLEAR what the two means and standard deviations are. The symmetry of course means that the model doesn't know whether the first or the second Gaussian has mean 0 or mean 100, so shouldn't the posterior have two peaks, one near 0 and one near 100? If so, it's obviously not Gaussian.

I would appreciate any help in this matter.

Best Answer

The paper Bayesian Inference for Mixture: The Label Switching Problem says

A K-Component mixture distribution is invariant to permutations of the labels of the components. As a consequence, in a Bayesian framework, the posterior distribution of the mixture parameters has theoretically K! nodes.

To me, this answers the question : No. In general the posterior distribution is not Gaussian.

Related Solutions

Solved – 2-Gaussian mixture model inference with MCMC and PyMC

The problem is caused by the way that PyMC draws samples for this model. As explained in section 5.8.1 of the PyMC documentation, all elements of an array variable are updated together. For small arrays like center this is not a problem, but for a large array like category it leads to a low acceptance rate. You can see the acceptance rate via

print mcmc.step_method_dict[category][0].ratio

The solution suggested in the documentation is to use an array of scalar-valued variables. In addition, you need to configure some of the proposal distributions since the default choices are bad. Here is the code that works for me:

import pymc as pm
sigmas = pm.Normal('sigmas', mu=0.1, tau=1000, size=2)
centers = pm.Normal('centers', [0.3, 0.7], [1/(0.1)**2, 1/(0.1)**2], size=2)
alpha  = pm.Beta('alpha', alpha=2, beta=3)
category = pm.Container([pm.Categorical("category%i" % i, [alpha, 1 - alpha]) 
                         for i in range(nsamples)])
observations = pm.Container([pm.Normal('samples_model%i' % i, 
                   mu=centers[category[i]], tau=1/(sigmas[category[i]]**2), 
                   value=samples[i], observed=True) for i in range(nsamples)])
model = pm.Model([observations, category, alpha, sigmas, centers])
mcmc = pm.MCMC(model)
# initialize in a good place to reduce the number of steps required
centers.value = [mu1_true, mu2_true]
# set a custom proposal for centers, since the default is bad
mcmc.use_step_method(pm.Metropolis, centers, proposal_sd=sig1_true/np.sqrt(nsamples))
# set a custom proposal for category, since the default is bad
for i in range(nsamples):
    mcmc.use_step_method(pm.DiscreteMetropolis, category[i], proposal_distribution='Prior')
mcmc.sample(100)  # beware sampling takes much longer now
# check the acceptance rates
print mcmc.step_method_dict[category[0]][0].ratio
print mcmc.step_method_dict[centers][0].ratio
print mcmc.step_method_dict[alpha][0].ratio

The proposal_sd and proposal_distribution options are explained in section 5.7.1. For the centers, I set the proposal to roughly match the standard deviation of the posterior, which is much smaller than the default due to the amount of data. PyMC does attempt to tune the width of the proposal, but this only works if your acceptance rate is sufficiently high to begin with. For category, the default proposal_distribution = 'Poisson' which gives poor results (I don't know why this is, but it certainly doesn't sound like a sensible proposal for a binary variable).

Related Question