Solved – Probabilistic programming vs “traditional” ML

bayesianmachine learningmarkov-chain-montecarloprobabilistic-programmingpymc

I was browsing the github repo for Pymc and found this notebook:

Variational Inference: Bayesian Neural Networks

The author extols the virtues of bayesian/probabilistic programming but then goes on to say:

Unfortunately, when it comes to traditional ML problems like classification or (non-linear) regression, Probabilistic Programming often plays second fiddle (in terms of accuracy and scalability) to more algorithmic approaches like ensemble learning (e.g. random forests or gradient boosted regression trees).

Could someone please explain:

  1. If this statement is generally true
  2. Why this statement is true

Best Answer

  1. It's generally true in my personal experience as a professional data scientist.

  2. It's true in my personal experience because it's what I observe most of the time. If you're asking why it happens this way, it's for a few reasons:

  3. Many traditional ML algorithms are nowadays available "off the shelf", including sophisticated ensemble methods, neural networks, etc. Probabilistic methods still often require bespoke solutions, either written in a DSL like Stan or directly in a general-purpose programming language.

  4. Many people entering data science nowadays are coming from engineering and natural science backgrounds, where they have strong mathematical and "algorithmic" skills but don't have as much experience or intuition with probability modeling. It's just not on their radar, and they aren't as comfortable with the methods and the software required to implement them.

  5. Making a "hard" prediction from a probabilistic model involves either hand-waving or formal decision theory. AI researchers and highly-paid statistical consultants know this, and they embrace it. But for the rank-and-file data scientist, it's not so easy to turn to your manager and start speaking in terms of distributions and probabilities. The business (or the automated system you're building) just needs a damn answer. Life is just a whole lot easier when you stop equivocating about probabilities and stuff, in which case you might as well not bother with them in the first place.

  6. Probabilistic modeling often ends up being very computationally intensive, especially Bayesian modeling where closed-form solutions are a rare luxury, and doubly especially on "big" data sets. I wouldn't hesitate to run XGBoost on a data set with 10 million rows. I wouldn't even consider running a Stan model on a data set with 10 million rows.

Given all the downsides described above, a data scientist or a small team of data scientists can iterate much more quickly using less-probabilistic machine learning techniques, and get "good enough" results.

Edit: as pointed out in the comments, #1 and #2 could both be because probabilistic programming methods haven't yet been shown to have knockout performance on real-world problems. CNNs got popular because they blew away existing techniques.

Edit 2: it seems that probabilistic is getting popular for time series modeling, where deep learning doesn't seem as effective as in other domains.

Edit 3 (December 2020): probabilistic programming is now becoming much more popular in a wide variety of domains, as probabilistic programming languages get better and the solvers get better and faster.

Related Question