What are some practical objections to the use of Bayesian statistical methods in any context? No, I don't mean the usual carping about choice of prior. I'll be delighted if this gets no answers.
Bayesian – What Are the Cons of Bayesian Analysis?
bayesian
Related Solutions
So we are clear, the idea is that I have data $Y \sim f(Y \mid \theta)$ and have a prior $\theta \sim \pi(\theta \mid \eta)$. Then the joint is $$ J(Y, \theta \mid \eta) = f(Y\mid \theta)\pi(\theta\mid \eta) $$ and the marginal of $Y$ is $$ m(Y\mid\eta)=\int f(Y\mid\theta) \pi(\theta\mid\eta) \ d\theta. $$ The empirical Bayes approach, rather than specifying the value of $\eta$ or placing a prior on $\eta$, estimates $$ \hat \eta = \arg \max_\eta m(Y\mid \eta). $$ Then, we draw inferences about $\theta$ from the "posterior" $\pi(\theta \mid Y, \hat \eta)$.
This describes parametric empirical Bayes. Maybe someone else can describe the situation for nonparametric empirical Bayes; I haven't dealt with it personally. The primary alternative to EB is to place a prior on $\eta$.
Some Pros
The procedure is, in principle, automatic. No work need to be done in eliciting a prior on $\eta$. Contrast this with choosing $\eta$ according to your prior knowledge about $\theta$, or using a hyperprior $\eta \sim \lambda(\eta \mid \gamma)$ (which will require specifying a value of $\gamma$). Subjectivity is always creeping in with these alternative approaches.
In practice, it can be very annoying to have to specify a prior. It can cause a lot of work on the part of the scientist. Empirical Bayes can shift the workload to the computer.
Related to 1., I've found that this can provide some stabilization of our results. Normally I would try to place a prior on $\eta$, but if my prior is too vague or out of line with the data, I find you can get some strange results. I'm more likely to get a sane answer with empirical Bayes. (Note: This is my personal experience with the models I've worked with; it is easy to imagine EB overfitting for the same reason ML results in overfitting).
Some Cons
It is not always easy to implement. What you can get away with in implementation depends on what problem you are looking at - if you are in an ML setting and are doing some variational approximation for inference you can often do some approximate EB, but if you are doing MCMC it can be quite difficult to implement in a computationally attractive way. Under MCMC you can try to fake things with a stochastic search algorithm, but as far as I know the theory behind this hasn't really been done.
By plugging in fixed point estimate $\hat \eta$ in place of $\eta$ and drawing inference from $\pi(\theta \mid Y, \hat \eta)$ as though we had specified $\hat \eta$ from the beginning, we are neglecting our uncertainty about $\eta$. There are ways to try to fix this, but mostly people just hope that it doesn't make a big difference. But if it really didn't make a difference, why not just put a hyperprior on $\eta$? This is especially suspect because the amount of information in the data about $\eta$ is often quite small.
It isn't clear what exactly we are doing from a statistical perspective. It isn't really Bayesian; at best, it is an approximation to Bayesian analysis. Hypothetically, if there was a prior on $\eta$ and it was tightly concentrated, then EB would be an approximation to fully Bayesian inference, but this typically isn't the case. So what the heck is this procedure doing? It seems to me that if I'm using this I'm usually either being a fake Bayesian or I have some reason to believe that the frequentist properties of the method are good. The principled Bayesian approach would be to put a prior on $\eta$, and this can work better in practice.
Hope that helps. I actually like EB quite a bit as a method for finding procedures and evaluating them according to their frequentist properties when I'm wearing my statistics hat. It gives frequentists a nice tool for constructing methods with "sharing of information" in hierarchical models. Occasionally the properties of EB estimators are provably good (e.g. the Stein shrinkage estimator can be derived from an EB standpoint). In ML, of course, you often just don't really care where procedures come from and just use whatever works.
Here are some links which may interest you comparing frequentist and Bayesian methods:
- http://www.stat.ufl.edu/archived/casella/Talks/BayesRefresher.pdf
- http://www.bayesian-inference.com/advantagesbayesian
- http://www.researchgate.net/post/Bayesian_vs_frequentist_statistics2
In a nutshell, the way I have understood it, given a specific set of data, the frequentist believes that there is a true, underlying distribution from which said data was generated. The inability to get the exact parameters is a function of finite sample size. The Bayesian, on the other hand, think that we start with some assumption about the parameters (even if unknowingly) and use the data to refine our opinion about those parameters. Both are trying to develop a model which can explain the observations and make predictions; the difference is in the assumptions (both actual and philosophical). As a pithy, non-rigorous, statement, one can say the frequentist believes that the parameters are fixed and the data is random; the Bayesian believes the data is fixed and the parameters are random. Which is better or preferable? To answer that you have to dig in and realize just what assumptions each entails (e.g. are parameters asymptotically normal?).
Best Answer
I'm going to give you an answer. Four drawbacks actually. Note that none of these are actually objections that should drive one all the way to frequentist analysis, but there are cons to going with a Bayesian framework:
None of these things should stop you. Indeed, none of these things have stopped me, and hopefully doing Bayesian analysis will help address at least number 4.