Solved – What are the downsides of bayesian neural networks

bayesian networkdeep learningmachine learningvariational-bayes

Bayesian neural nets (BNN) are very popular topic. With development of variational approximation it became possible to train such models much faster then with Monte Carlo sampling. BNNs allow such interesting features as natural regularisation and even uncertainty estimation. So, the question is: why haven't we still completely migrated on BNNs?

I can assume that variational inference does not provide enough accuracy. Is it the only reason?

Best Answer

There are still many disadvantages of BNN compared with NN as listed below:

  1. The computational cost is heavier. In here I am not just referring to the cost of training, i.e. getting the posterior distribution of all parameters. This part is in fact OK if you use variational inference with a simple distribution family for BNN parameters. After your model is deployed and you want to make an inference, then you will need to sample N parameters from their posterior distribution in order to get the distribution of output, and this is N times more computational cost than just using NN.
  2. The tools of BNN is not popularized yet and is not so automatic as tools of NN.
  3. You need to make some assumptions about your prior, which is relatively difficult for most users.
  4. This is the most important reason that I think why BNN is not adopted universally instead of NN: the uncertainty we get is not as useful as we thought at first glance. Let's take an easy example: say you have two types of customers. Type A will have equal probability of giving you \$40 or \$60, and Type B will have equal probability of giving you \$30 or \$70. They have equal expectation, but larger uncertainty for the type A customer. Assume your BNN works perfectly well to tell one distribution from another. However, the uncertainty here does not matter if you have one million customers of each, because at that time what matters is not the uncertainty of individual customers, but the uncertainty of average, which goes towards zero when your number of customers goes larger according to law of large numbers. Therefore, you really do not need uncertainty in your model most of the time.