Solved – Bayesian recurrent neural network with keras and pymc3/edward

bayesianneural networkspymc

I have a very simple toy recurrent neural network implemented in keras which, given an input of N integers will return their mean value. I would like to be able to modify this to a bayesian neural network with either pymc3 or edward.lib so that I can get a posterior distribution on the output value

e.g. p(output | weights).

I have read through blog posts from autograd, pymc3 and edward [1,2,3] but all seem geared to classification problems.

Cheers

[1] https://github.com/HIPS/autograd/blob/master/examples/bayesian_neural_net.py

[2] http://twiecki.github.io/blog/2016/06/01/bayesian-deep-learning/

[3] https://github.com/blei-lab/edward/blob/master/examples/getting_started_example.py

Edit

To clarify – I am asking if anyone can offer some experience/advice/references relevant to building a Bayesian RNN in anything other than a classification task.

Best Answer

From a pure implementation perspective, it should be straightforward: take your model code, replace every trainable Variable creation with ed.Normal(...) or sth similar, establish variational posteriors as well, zip them in a dict, feed it to some inference object from edward et voila.

The problem is that variational training of RNNs, since based on sampling, is quite hard. The sampling noise will be of no fun as soon as it is amplified by the recurrent net's dynamics. To my knowledge, there is currently no "gold standard" on how to do this in general.

The starting point is probably Alex Graves's paper [1]; some recent work has been done by Yarin Gal [2], where dropout is interpreted as variational inference. It will give you a predictive distribution by integrating out the dropout noise.

The latter one will probably be the easiest to get to work, but I have no practical experience myself.

[1] Graves, Alex. "Practical variational inference for neural networks." Advances in Neural Information Processing Systems. 2011.
[2] Gal, Yarin. "A theoretically grounded application of dropout in recurrent neural networks." arXiv preprint arXiv:1512.05287 (2015).

Related Solutions

Solved – Bayesian model selection in PyMC3

You can compute the likelihood of a model indeed using model.logp(). As input, it requires a point. For example, the BEST model from the examples directory I can do:

np.exp(model.logp({'group1_mean': 0.1, 
                   'group2_mean': 0.2, 
                   'group1_std_interval': 1., 
                   'group2_std_interval': 1.2, 
                   'nu_minus_one_log': 1}))

Note that this model is using transformed variables, so I have to supply these. You could then take the exp() of that and use it inside a numerical integrator, for example as provided by scipy.integrate. The problem is that even with only 5 parameters, this will be very slow.

Bayes Factors are generally very difficult to compute because you have to integrate over the complete parameter space. There are some ideas to using MCMC samples for that. See this post, and especially the comment section for more information: https://radfordneal.wordpress.com/2008/08/17/the-harmonic-mean-of-the-likelihood-worst-monte-carlo-method-ever/ The case for BIC is unfortunately similar.

If you really want to compute the Bayes Factor, you can also look at the Savage Dickey Ratio test (see e.g. http://drsmorey.org/bibtex/upload/Wagenmakers:etal:2010.pdf), but it's application is limited.

I suppose that you're trying to do model comparison which is a field with many opinions and solutions (some hard to implement, like BFs). One measure that is very easy compute is the Deviance Information Criterion. It has its downsides, although some of them can be remedied (see http://onlinelibrary.wiley.com/doi/10.1111/rssb.12062/abstract). Unfortunately we haven't ported the code pymc3 yet, but it'd be pretty easy (see here for the pymc2 implementation: https://github.com/pymc-devs/pymc/blob/895c24f62b9f5d786bce7ac4fe88edb4ad220364/pymc/MCMC.py#L410).

Kruschke favors the approach to just build the full model and let it tell you which parameters matter. You could also build variable selection into the model itself (see e.g. http://arxiv.org/pdf/math/0505633.pdf).

Finally, for a much more complete treatment, see this recent blog post: http://jakevdp.github.io/blog/2015/08/07/frequentism-and-bayesianism-5-model-selection/

Solved – Once trained, is it normal that LSTM-based neural nets output different values even though the input is the same

The reason is that currently there is a bug using dropout in RNNs, for some reason it still apply dropout at prediction time. We are checking that issue, it will hopefully be solved soon. So for now you can just remove dropout parameter from your lstm layer.

Edit

Best Answer

Related Solutions

Solved – Bayesian model selection in PyMC3

Solved – Once trained, is it normal that LSTM-based neural nets output different values even though the input is the same

Related Question