Probability – Do Mixture Models Defy Entropy Principles?

entropymixture-distributionnormal distributionprobability

Recently, I have learned about the principle of Maximum Entropy with regards to Probability Distribution (https://www.youtube.com/watch?v=2gTrsLVnp9c) – in particular, when certain "information" (i.e. constraints) is available about some class of probability distribution function (e.g. domain over which the probability function is defined, expectation, etc.), we can use the principle of Maximum Entropy to determine the "most informative" probability distribution function from this class of probability distribution functions in this situation.

Apparently, in many real world situations (e.g. when the data is continuous and can take any value between negative infinity and positive infinity) – the Normal Distribution ends up being the probability distribution function with the Maximum Entropy, thus often resulting in the "most informative" choice of probability distribution function when compared to any other candidate.

However, by using the logic from the video : there any many real world situations in which the principle of Maximum Entropy would suggest that the Normal Distribution would be the most informative probability distribution function to use – but in reality, a "non normal" probability distribution function ends up being a better choice. In my opinion, a clear example of this are "Mixture Distributions" (e.g. Mixture Models). For example, there might be some instances in which the data might several distinct and "latent" groups and each of these groups might come from a distinct (normal) distribution – thus a (correctly specified) mixture distribution would be able to find some combination of different distributions and possibly result in a "better model" than a single normal distribution (where the normal distribution was the suggested candidate via the Maximum Entropy Principle). I could see this being the case even if we did not know apriori that our data contains several distinct latent groups.

My Question: Do Mixture Distributions defy the principle of Maximum Entropy – or could the principle of Maximum Entropy somehow still be used in such a context to suggest that Mixture Distributions are still the most "informative" distributions?

Best Answer

we can use the principle of Maximum Entropy to determine the "most informative" probability distribution function from this class of probability distribution functions in this situation.

More entropy is less information.

However, by using the logic from the video : there any many real world situations in which the principle of Maximum Entropy would suggest that the Normal Distribution would be the most informative probability distribution function to use - but in reality, a "non normal" probability distribution function ends up being a better choice.

If you choose a normal distribution as solution to some problem, but something else turns out the be the real answer, then possibly you did not have the correct information to solve the problem or you made the analysis incorrectly.

'being a better choice' is not a contradiction because you measure it differently in the two different cases.

Related Question