Relation between different data distribution and ML models

distributionsmachine learningmodeling

Despite the generalized linear models incorporate different distributions (e.g. Gaussian, Poisson, etc.) into the modelling process, I am not sure similar usage exists in machine learning. And it seems other than the iid assumptions, all those statistical distributions are so irrelevant in ML modelling. Are there any instances where statistical distribution can incorporate with ML models? Or is there such an application? What are the relation between the two?

Best Answer

ML models define their own conditional distributions ($p(y|x)$ for discriminative modeling, $p(x)$ for generative modeling etc.), likelihood of which is maximized with the training data (or for deriving the posterior in bayesian analysis). I see ML as a generalization of probabilistic modeling where the distribution itself is inferred from the data, often without any assumptions. Here the fact data comes from an unknown distribution, which can not be easily modeled, acts as an impetus for using more expressive neural networks as distributions.

In this case common statistical distributions often act as strong priors for parameters (such as in bayesian neural networks) and in that way, the parameter space essentially becomes a multivariate statistical distribution.

On the other hand, I have sometimes seen examples where parameters of statistical distributions (eg. $\mu$, $\sigma$ in case of normal distribution) are framed as the outputs of a neural network and computed by training it. In this case, we make assumptions that the data comes from a certain, fixed distribution and learn its parameters via flexible ML modeling. This can often take the form complicated hierarchical statistical modeling as well.

Related Question