You can compute the likelihood of a model indeed using model.logp(). As input, it requires a point. For example, the BEST model from the examples directory I can do:
np.exp(model.logp({'group1_mean': 0.1,
'group2_mean': 0.2,
'group1_std_interval': 1.,
'group2_std_interval': 1.2,
'nu_minus_one_log': 1}))
Note that this model is using transformed variables, so I have to supply these. You could then take the exp() of that and use it inside a numerical integrator, for example as provided by scipy.integrate. The problem is that even with only 5 parameters, this will be very slow.
Bayes Factors are generally very difficult to compute because you have to integrate over the complete parameter space. There are some ideas to using MCMC samples for that. See this post, and especially the comment section for more information: https://radfordneal.wordpress.com/2008/08/17/the-harmonic-mean-of-the-likelihood-worst-monte-carlo-method-ever/ The case for BIC is unfortunately similar.
If you really want to compute the Bayes Factor, you can also look at the Savage Dickey Ratio test (see e.g. http://drsmorey.org/bibtex/upload/Wagenmakers:etal:2010.pdf), but it's application is limited.
I suppose that you're trying to do model comparison which is a field with many opinions and solutions (some hard to implement, like BFs). One measure that is very easy compute is the Deviance Information Criterion. It has its downsides, although some of them can be remedied (see http://onlinelibrary.wiley.com/doi/10.1111/rssb.12062/abstract). Unfortunately we haven't ported the code pymc3 yet, but it'd be pretty easy (see here for the pymc2 implementation: https://github.com/pymc-devs/pymc/blob/895c24f62b9f5d786bce7ac4fe88edb4ad220364/pymc/MCMC.py#L410).
Kruschke favors the approach to just build the full model and let it tell you which parameters matter. You could also build variable selection into the model itself (see e.g. http://arxiv.org/pdf/math/0505633.pdf).
Finally, for a much more complete treatment, see this recent blog post: http://jakevdp.github.io/blog/2015/08/07/frequentism-and-bayesianism-5-model-selection/
The reason is that currently there is a bug using dropout in RNNs, for some reason it still apply dropout at prediction time.
We are checking that issue, it will hopefully be solved soon. So for now you can just remove dropout parameter from your lstm layer.
Best Answer
From a pure implementation perspective, it should be straightforward: take your model code, replace every trainable Variable creation with
ed.Normal(...)
or sth similar, establish variational posteriors as well, zip them in a dict, feed it to some inference object from edward et voila.The problem is that variational training of RNNs, since based on sampling, is quite hard. The sampling noise will be of no fun as soon as it is amplified by the recurrent net's dynamics. To my knowledge, there is currently no "gold standard" on how to do this in general.
The starting point is probably Alex Graves's paper [1]; some recent work has been done by Yarin Gal [2], where dropout is interpreted as variational inference. It will give you a predictive distribution by integrating out the dropout noise.
The latter one will probably be the easiest to get to work, but I have no practical experience myself.