Solved – Probability that Null Hypothesis is True

bayesianhypothesis testingprobability

So, this may be a common question, but I’ve never found a satisfactory answer.

How do you determine the probability that the null hypothesis is true (or false)?

Let’s say you give students two different versions of a test and want to see if the versions were equivalent. You perform a t-Test and it gives a p-value of .02. What a nice p-value! That must mean it’s unlikely that the tests are equivalent, right? No. Unfortunately, it appears that P(results|null) doesn’t tell you P(null|results). The normal thing to do is to reject the null hypothesis when we encounter a low p-value, but how do we know that we are not rejecting a null hypothesis that is very likely true? To give a silly example, I can design a test for ebola with a false positive rate of .02: put 50 balls in a bucket and write “ebola” on one. If I test someone with this and they pick the “ebola” ball, the p-value (P(picking the ball|they don’t have ebola)) is .02, but I definitely shouldn’t reject the null hypothesis that they are ebola-free.

Things I’ve considered so far:

  1. Assuming P(null|results)~=P(results|null) – clearly false for some important applications.
  2. Accept or reject hypothesis without knowing P(null|results) – Why are we accepting or rejecting them then? Isn’t the whole point that we reject what we think is LIKELY false and accept what is LIKELY true?
  3. Use Bayes’ Theorem – But how do you get your priors? Don’t you end up back in the same place trying to determine them experimentally? And picking them a priori seems very arbitrary.
  4. I found a very similar question here: stats.stackexchange.com/questions/231580/. The one answer here seems to basically say that it doesn't make sense to ask about the probability of a null hypothesis being true since that's a Bayesian question. Maybe I'm a Bayesian at heart, but I can't imagine not asking that question. In fact, it seems that the most common misunderstanding of p-values is that they are the probability of a true null hypothesis. If you really can't ask this question as a frequentist, then my main question is #3: how do you get your priors without getting stuck in a loop?

Edit:
Thank you for all the thoughtful replies. I want to address a couple common themes.

  1. Definition of probability: I'm sure there is a lot of literature on this, but my naive conception is something like "the belief that a perfectly rational being would have given the information" or "the betting odds that would maximize profit if the situation was repeated and unknowns were allowed to vary".
  2. Can we ever know P(H0|results)? Certainly, this seems to be a tough question. I believe though, that every probability is theoretically knowable, since probability is always conditional on the given information. Every event will either happen or not happen, so probability doesn't exist with full information. It only exists when there is insufficient information, so it should be knowable. For example, if I am told that someone has a coin and asked the probability of heads, I would say 50%. It may happen that the coin is weighted 70% to heads, but I wasn't given that information, so the probability WAS 50% for the info I had, just as even though it happens to land on tails, the probability WAS 70% heads when I learned that. Since probability is always conditional on a set of (insufficient) data, one can never not have enough data to determine it and so it should always be (theoretically) knowable.
    Edit: "Always" may be a little too strong. There may be some philosophical questions for which we can't determine probability. Still, in real-world situations, while we can "almost never" have absolute certainty, there should "almost always" be a best estimate.

Best Answer

You have certainly identified an important problem and Bayesianism is one attempt at solving it. You can choose an uninformative prior if you wish. I will let others fill in more about the Bayes approach.

However, in the vast majority of circumstances, you know the null is false in the population, you just don't know how big the effect is. For example, if you make up a totally ludicrous hypothesis - e.g. that a person's weight is related to whether their SSN is odd or even - and you somehow manage to get accurate information from the entire population, the two means will not be exactly equal. They will (probably) differ by some insignificant amount, but they won't match exactly. ' If you go this route, you will deemphasize p values and significance tests and spend more time looking at the estimate of effect size and its accuracy. So, if you have a very large sample, you might find that people with odd SSN weigh 0.001 pounds more than people with even SSN, and that the standard error for this estimate is 0.000001 pounds, so p < 0.05 but no one should care.