Solved – Bayesian recurrent neural network with keras and pymc3/edward

bayesianneural networkspymc

I have a very simple toy recurrent neural network implemented in keras which, given an input of N integers will return their mean value. I would like to be able to modify this to a bayesian neural network with either pymc3 or edward.lib so that I can get a posterior distribution on the output value

e.g. p(output | weights).

I have read through blog posts from autograd, pymc3 and edward [1,2,3] but all seem geared to classification problems.

Cheers

[1] https://github.com/HIPS/autograd/blob/master/examples/bayesian_neural_net.py

[2] http://twiecki.github.io/blog/2016/06/01/bayesian-deep-learning/

[3] https://github.com/blei-lab/edward/blob/master/examples/getting_started_example.py

Edit

To clarify – I am asking if anyone can offer some experience/advice/references relevant to building a Bayesian RNN in anything other than a classification task.

Best Answer

From a pure implementation perspective, it should be straightforward: take your model code, replace every trainable Variable creation with ed.Normal(...) or sth similar, establish variational posteriors as well, zip them in a dict, feed it to some inference object from edward et voila.

The problem is that variational training of RNNs, since based on sampling, is quite hard. The sampling noise will be of no fun as soon as it is amplified by the recurrent net's dynamics. To my knowledge, there is currently no "gold standard" on how to do this in general.

The starting point is probably Alex Graves's paper [1]; some recent work has been done by Yarin Gal [2], where dropout is interpreted as variational inference. It will give you a predictive distribution by integrating out the dropout noise.

The latter one will probably be the easiest to get to work, but I have no practical experience myself.

  • [1] Graves, Alex. "Practical variational inference for neural networks." Advances in Neural Information Processing Systems. 2011.
  • [2] Gal, Yarin. "A theoretically grounded application of dropout in recurrent neural networks." arXiv preprint arXiv:1512.05287 (2015).