Solved – When is Gaussian mixture model more effective than Single Gaussian

gaussian mixture distributionmachine learning

From my experiment, given training samples of around 600 each having vector length of 30, the single gaussian component performed much better unexpectedly. Having said that, I want to know how or when it is better to use mixture of Gaussian (any rule of thumb or whatever)

Best Answer

In your case a single kernel model has $D+D(D-1)/2 = 465$ degrees of freedom, $30$ from the mean, and $435$ from the elements of the covariance matrix. Thus you only have less than 2 samples per degree of freedom, and a two kernel model would have less than one data sample per degree of freedom. A rule of thumb would be to require at least 3-5, and prefer $>10$, samples per degree of freedom for model estimation.

Your specific case is probably one of overfitting: the mixture model has more degrees of freedom, and thus can better fit the training data, including statistical fluctuations in it. Sometimes this is referred to as "fitting the noise" or just "overfitting". The lower number of degrees of freedom in the single kernel model provided better generalization.

More generally, this is problem in model selection.

Globally, it's better to use a mixture of Gaussians when that kind of distribution can more accurately reflect the true distribution of your data. Of course, you don't actually know the true distribution of the data.

In some cases you can identify whether a single kernel, or a mixture of kernels are required by qualitative data analysis. Usually features like the Bayesian Information Criterion (BIC) or Akaike Information Criterion are used to formalize the idea of balancing the goodness of fit to your limited data sample against the degrees of freedom that your model provides.

Related Question