When discussing the concept of mixtures of distributions in my machine learning textbook, the authors state the following:
A Gaussian mixture model is a universal approximator of densities, in the sense that any smooth density can be approximated with any specific nonzero amount of error by a Gaussian mixture model with enough components.
I only have a basic background in probability and statistics, and I have absolutely no idea what this section is saying.
I would greatly appreciate it if people could please take the time to explain this in a way that is more understandable to someone of my level.
Best Answer
The idea is that an arbitrary density on $\mathbb{R}$, $f(\cdot)$, can be approximated by a Gaussian mixture model $$g_k(\cdot;\boldsymbol{\omega,\mu,\sigma})=\sum_{i=1}^k \omega_i \varphi(\cdot;\mu_i,\sigma_i)$$ in the sense that $$\mathfrak{D}(f(\cdot),g_k(\cdot;\boldsymbol{\omega,\mu,\sigma}))\stackrel{k\to\infty}{\longrightarrow}0$$ for some specific functional distance $\mathfrak{D}(\cdot,\cdot)$. This result only applies for weaker types of distance.