Bayesian Analysis – Why Bayes Factor Sometimes More Important Than Posterior Odds?

bayesianodds-ratio

To the best of my knowledge, the posterior odds satisfies the equation: $$(\text{posterior odds}) = (\text{Bayes factor}) \times (\text{prior odds}) $$ This is a simple consequence of Bayes' rule.

The whole point of Bayesian inference when applied to model selection, or so I thought, was to use the information from the prior probabilities to get a more accurate estimate of what the correct answer is than the naive estimate one gets from the likelihoods, which is given by the Bayes factor.

However, I recall having read several papers where the Bayes factors were reported as evidence of one model being more likely than the other.

Was the idea of the paper's authors to appeal to frequentists who would have considered it taboo to incorporate information from prior probabilities, and to show that their argument was (their results were) robust to such objections on methodological/philosophical grounds?

Would a Bayesian ever be more interested in the Bayes factor than the posterior odds?

Note: I had these questions while reading the first chapter of James Stone's "Bayes' Rule: A Tutorial Introduction to Bayesian Analysis" and while thinking back to some papers I had read a while ago about influenza virus transmission. I can try to find the paper if that would help.

Anyway I am a complete novice at this so I apologize in advance if this question is non-sensical.

Best Answer

The Bayes factor is defined on hypotheses, not parameter values.

For hypotheses $H_1$ and $H_2$, with observed data $Y$, we define the Bayes factor $\frac{P\left( Y\ |\ H_1 \right)}{P\left( Y\ |\ H_2 \right)}$. When $H_1$ and $H_2$ are point hypotheses, it is in fact equivalent to a likelihood ratio, and when $H_1$ and $H_2$ are nested this likelihood ratio is usable in all the standard statistical tests.

But we generally aren't interested in Bayes factors of point hypotheses in nested models. We want to compare model specifications wholesale, and that is something we cannot do with likelihoods. This is possible because an "un-fitted" statistical model $H$ is effectively a compound hypothesis over the entire parameter space $\Theta$ for that model. By this logic, we can treat any model as an hypothesis and use the law of total probability to obtain $$ P\left( Y\ |\ H \right) = \int_\Theta P\left( Y\ |\ \theta, H \right) P\left( \theta\ |\ H \right)\ \mathrm{d}\Theta $$ which is clearly not the same thing as the maximum likelihood $\max_{\theta \in \Theta} P\left( Y\ |\ \theta, H \right)$. It should be obvious from this definition that the Bayes factor does depend on one's choice of priors, and heavily so. In fact, Bayes factors can be used to compare the plausibility of different priors for otherwise identical models (philosophical concerns notwithstanding).

You have in mind something like "the posterior odds of $\theta$ and $\theta'$", and therefore you are confused as to how a Bayes factor is any different from a likelihood. What you need to consider is not the posterior odds of two specific parameter values $\theta$ and $\theta'$ but the posterior odds of entire models. Stats 101 implicitly trains us to think of hypotheses as numerical values. To understand Bayes factors, it is better to think of an hypothesis as a pair like $\left(M, \Theta\right)$, where $\Theta$ is a parameter space and $M$ is a representation of the model specification.

You noted that the Bayes factor can be interpreted as $(\text{posterior odds}) = (\text{Bayes factor}) \times (\text{prior odds})$. This isn't wrong, but your question evinces the danger in relying too heavily on a simple interpretation of a rich concept.

There are actually several very nice writeups and explanations of Bayes factors out there on the Internet. Here are a few I've found helpful:

Related Question