NHST from the Bayesian point of view

bayesianhypothesis testing

In How to become a Bayesian in eight easy steps, it is stated

Frequentist statistics only allow for statements to be made about P(data | theory). Assuming the theory is correct, the probability of observing the obtained (or more extreme) data is evaluated. Dienes argues that often the probability of the data assuming a theory is correct is not the probability the researcher is interested in. What
researchers typically want to know is P(theory | data): Given that the data were those obtained, what is the probability that the theory is correct?

My question is : from the Bayesian point of view, what is the value of answers given by NHST (statements about p(data|theory) )?
Does it give an answer to the wrong question? Is it of limited value, or completely useless?

Best Answer

If you're asking about "people who identify strongly as Bayesians," I'm sure you've seen that some individual Bayesians do believe NHST is mostly useless :-)

But if you're asking about "Bayesian methodology," statements about P(data|theory) are merely answers to a different question. There doesn't need to be a value judgment -- classical questions just aren't addressed by Bayesian methods and vice versa, just like a nail isn't addressed by a screwdriver or a screw isn't addressed by a hammer.

Classical statements about P(data|theory) deliberately focus on sampling variation, separating it out from other sources of uncertainty. They address questions that should be noncontroversial when a scientist is planning or reporting properties of their study design. My kitchen scale is accurate to within 1 gram, while my bathroom scale is only accurate to within 1 pound... Likewise with a scientific study design: Can we expect it to give a sample mean within 1 gram of the population mean, or only within 1 pound of the population mean?

Bayesian statements about P(theory|data,prior) deliberately combine sampling variation with your prior beliefs about the theory into new posterior beliefs. They should be noncontroversial when you've sincerely believed a prior, and then you saw new data: how should you update your prior into a posterior? You might use that posterior to report a credible interval, for instance letting the rest of us know whether your posterior uncertainty about the mean is on the order of grams or pounds. But because you've averaged over a prior, this uncertainty is not trying to quantify the sampling variation alone, like P(data|theory) did. So, P(data|theory) is simply tackling a different question than finding a posterior.

The same person might care about both questions at different moments, of course.

  • When I'm pondering whether the study is precise / powerful enough, I might wear my Classical hat and look at P(data|theory). Then I won't average over a prior, if it's a distraction from my current question.
  • Or when I'm up against a deadline and have to make a decision / take an action based on the currently-available information, I might wear my Bayesian hat. Then I just need to pin down my prior, get a posterior, and use it to make my decision, regardless of whether or not the sampling variation is as low as I'd like it to be, so I won't be looking at P(data|theory) alone.

For a brief example of "how P(data|theory) answers whether the study is powerful enough", consider a one-sample Normal-based test of $H_0: \mu=\mu_0$ vs $H_A: \mu \neq \mu_0$. With Normally-distributed data, the power at a specific sample size $n$, alternative $\mu_A$, assumed standard deviation $\sigma$, and standard significance level $\alpha=0.05$ is given by $$P\left(\left|(\bar{x}-\mu_0)/SE\right| > 1.96 | \mu=\mu_A,\sigma\right)$$ where $SE=\sigma/\sqrt{n}$. In other words, "For samples of size $n$, how often will we reject $H_0$, if the true mean is $\mu_A$ and true SD is $\sigma$?" is equivalent to "For samples of size $n$, how often will we obtain a sample in which $\bar{x}$ falls at least 1.96 $SE$'s away from $\mu_0$, if the true mean is $\mu_A$ and true SD is $\sigma$?" which is a specific instance of P(data|theory).

The Bayesian might say "I'm just not going to try answering that question. The question I'd prefer to ask is $P(\mu=\mu_A|data)$. I've got a prior on $\mu$ and $\sigma$ already, and I'm going to update it with the data in order to answer my question instead of yours. Or if I still have to choose a sample size, I'll choose $n$ big enough to ensure my posterior has a desired level of precision on average across my prior."

The Frequentist might respond "Go ahead. But I am not trying to ask what to believe after averaging across the prior. I am trying to ask what to expect at specific combinations of $n$, $\mu_A$, and $\sigma$. That way, I can choose $n$ large enough to have high power for ruling out $\mu_0$ even if $|\mu_A-\mu_0|$ is as small as such-and-such. I'll plan for the worst case, not for the average case."


Finally, one more perspective: If you and your research community all agree on the same prior, the Bayesian can often get away with a smaller $n$ than the Frequentist can, because the prior is often mathematically equivalent to having a few extra observations. In this sense, the Bayesian asks "What's the precision of our estimate if we consider earlier data as well as the current study?" Meanwhile, the Frequentist asks "What's the precision of our estimate just from this current study?" Both questions can be useful; neither strictly dominates the other.

Related Question