Gaussian Mixture Model – How to Evaluate Loss Using Scikit-Learn

gaussian mixture distributionloss-functionsmixture-distributionscikit learn

I successfully modeled my data using a Gaussian Mixture Model in scikit-learn but I can't figure out how I should say "how good" the model is by calculating the loss.

My first thought was to calculate the differences in kde distributions but then I realized that it is too dependent on the KDE model.

What is the standard way to evaluate Gaussian Mixture Model performance and fit?

enter image description here

Best Answer

The trade-off in GMM is between the number of gaussian distribution (number of components) and the likelihood of your model.

The difficulty is that there is "no truth" about the number of distributions. The more you add the higher your likelihood, the less you add the poorer your likelihood.

To model that trade-off, the classic approach is to use AIC ou BIC.

https://en.wikipedia.org/wiki/Akaike_information_criterion https://en.wikipedia.org/wiki/Bayesian_information_criterion