Solved – Evaluate mixture model

goodness of fitmixture-distributionmodelnormal distribution

I have a question concerning the evaluation of mixture models. Is there a gold standard to compute the goodness of a fit for a mixture model?

What I am concerned about is how one would evaluate if one, two or three gaussians fit a given distribution better. Truly, one could visually inspect that but I am looking for an automated way that has a statistical meaning.

My initial idea was to measure the KS statistic between the observed distribution and sampled distributions by the estimated mean and variance for each model. Admittedly, I am not an expert for mixture models so I might miss something obvious here.

So I guess, what I am looking for is some kind of likelihood ratio test than gives me the best performing model for one, two or three overlapping distributions.

I am very thankful for any keywords and links that I can look up!

Best Answer

You can use a model selection tool such as AIC or BIC to compare the models. However, this does not tell you about the goodness of fit. The same applies to the Likelihood ratio.

A formal goodness of fit test can be conducted by using the chi-square goodness of fit test. This is very sensitive to the choice of the bin-width, though.

A less formal, and more visual, goodness of fit test is the QQ-envelope which is obtained as follows:

  1. Fit the mixture model to your data.
  2. Simulate $N$ (large) samples from the fitted model of the same size as the original sample.
  3. Calculate the QQ plots for each simulated sample against the original sample, and plot all of them together. This will generate an envelope that tells you how good or bad your model reproduces the data.

You can use this tool to identify areas where the model produces a poor fit.