Bayesian Machine Learning – Applications of Bayesian Methods in Supervised Learning

bayesianmachine learningsupervised learning

I began to read about Bayesian machine learning. I have an expertise
in using algorithms such as gradient boosting or random forests for
supervised learning problems. In order to understand the difference of
the Bayesian approach I would like to look at an example application
of the Bayesian approach for a classical supervised learning problem
such as house price prediction (such as the one given in this Kaggle competition) However, I could not find such an
example application. Is it because Bayesian approach not suitable for this
problem? If it is not then to what kind of problems it is suitable. If it is, how can I solve this problem using the Bayesian approach? Thanks

Best Answer

It depends what do you mean by "Bayesian machine learning". For example, using Lasso regression is equivalent to using Bayesian regression with Laplace priors for the parameters, for ridge regression, it is equivalent to having Gaussian priors, using Naive Bayes with Laplace smoothing is like assuming uniform prior for the smoothing, etc. Many machine learning models can be interpreted in as special cases of Bayesian models. In scikit-learn you can find the BayesianRidge regressor that works nearly the same as the Ridge regressor.

The above examples refer to maximum a posteriori (MAP) estimation, i.e. finding only the mode of the posterior distribution, rather than full Bayesian estimation, where you would learn the posterior distribution. Learning the full posterior distribution, it would be more complicated and you could use several ways to approximate it, for example Laplace approximation, variational inference, or Markov Chain Monte Carlo sampling. Notice that those methods would be more computationally intensive, would not be available out-of-the-box in the standard machine learning software, and in many cases would need deeper mathematical understanding of the model, what makes them less popular.

Using the example of linear regression, the Bayesian equivalent would not differ very much from the frequentist counterpart. In every case where you could use linear regression, you could use Bayesian flavor as well. The main differences would be that in Bayesian case, you need to decide on priors for the parameters. it will give you uncertainty estimates for free, and it will be more computationally demanding to train than the non-Bayesian model.

Related Question