Machine Learning Regression – What is the Importance of Probabilistic Machine Learning?

machine learningprobabilityregression

I am attending a course on "Introduction to Machine Learning" where a large portion of this course to my surprise has a probabilistic approach to machine learning (ML), like a probabilistic approach to linear and logistic regression and trying to find the optimal weights using MLE, MAP or Bayesian methods. But what is its importance?

We can do all this using a non-probabilistic approach as well. My instructor only told me that "With a probabilistic view, we got more information about the data", which I understand, that for each point rather than claiming something true or false directly, we are taking into count the probability, which gives us a continuous value and describe to what extent that value is really useful. But apart from this, what are the advantages of probabilistic machine learning, and what can serve as a motivation for such viewpoints?

Best Answer

Contemporary machine learning, as a field, requires more familiarity with Bayesian methods and with probabilistic mathematics than does traditional statistics or even the quantitative social sciences, where frequentist statistical methods still dominate. Those coming from Physics are less likely to be surprised by the importance of probabilities in ML since quantum physics is so thoroughly probabilistic (indeed, many key probabilistic algorithms are named after physicists). In fact, three of the leading ML textbooks (while all broad enough in their coverage to be considered fair overviews of ML) are written by authors who explicitly favor probabilistic methods (and McKay and Bishop were both trained as physicists):

  • Kevin Murphy's Machine Learning: A Probabilistic Perspective (an encyclopedic, nearly comprehensive reference-style work)
  • Christopher Bishop's Pattern Recognition and Machine Learning (a rigorous introduction that assumes much less background knowledge)
  • David McKay's Information Theory, Inference, and Learning Algorithms (foregrounding information theory, but welcoming Bayesian methods)

My point: the most widely used ML textbooks reflect the same probabilistic focus you describe in your Intro to ML course.

In terms of your specific question, Zoubin Ghahramani, another influential proponent of probabilistic ML, argues that the dominant frequentist version of ML--deep learning--suffers from six limitations that explicitly probabilistic, Bayesian methods often avoid:

  1. very data hungry
  2. very compute-intensive to train and deploy
  3. poor at representing uncertainty and knowing what they don't know
  4. easily fooled by adversarial examples
  5. finicky to optimize (non-convex, choice of architecture and hyperparameters)
  6. uninterpretable black boxes, lacking transparency, difficult to trust

Ghahramani elaborates on these points in many great tutorials and in this non-specialist overview article from Nature (2015) on Probabilistic Machine Learning and Artificial Intelligence.

Ghahramani's article emphasizes that probabilistic methods are crucial whenever you don't have enough data. He explains (section 7) that nonparametric Bayesian models can expand to match datasets of any size with a potentially infinite number of parameters. And he notes that many datasets that may seem enormous (millions of training examples) are in fact large collections of small datasets, where probabilistic methods remain crucial to handle the uncertainties stemming from insufficient data. A similar thesis grounds Part III of the renowned book Deep Learning, where Ian Goodfellow, Yoshua Bengio, and Aaron Courville argue that "Deep Learning Research" must become probabilistic in order to become more data efficient.

Because probabilistic models effectively "know what they don't know", they can help prevent terrible decisions based on unfounded extrapolations from insufficient data. As the questions we ask and the models we build become increasingly complex, the risks of insufficient data rise. And as the decisions we base upon our ML models become increasingly high-stake, the dangers associated with models that are confidently wrong (unable to pull back and say "hey, wait, I've never really seen inputs like this before") increase as well. Since both of those trends seem irreversible--ML growing in both popularity and importance--I expect probabilistic methods to become more and more widespread over time. As long as our datasets remain small relative to the complexity of our questions and to the risks of giving bad answers, we should use probabilistic models that know their own limitations. The best probabilistic models have something analogous to our human capacity to recognize feelings of confusion and disorientation (registering huge or compounding uncertainties). They can effectively warn us when they are entering uncharted territory and thereby prevent us from making potentially catastrophic decisions when they are nearing or exceeding their limits.

Related Question