Solved – Multiple hypothesis testing and F tests

f-testmultiple-comparisonsstatistical significance

Does the multiple hypothesis testing problem apply to the calculation of an F statistic for joint significance? It seems to me that the more variables you are including in your test for joint significance, the more you are accepting that any one of your tests will produce a false positive, right?

If so:

  1. For all intents and purposes, is it really that big of a deal?
  2. Is there a way to get around this?

EDIT: The responses make me think that I am understanding either the multiple hypothesis testing problem or F tests wrong. Here's an explanation of the conflict that is in my head, which may be incorrect.

My understanding of the multiple hypothesis testing problem is this: Our alpha level (ex. $\alpha=0.05)$ is our accepted Type I error level, which is sort of a theoretical concept. If we are testing multiple hypotheses simultaneously, like $H_1: x_1 = 0$ and $H_2: x_2 = 0$, then we are implicitly testing $H_0:(x_1 = 0 |x_2=0)$, right? In which case we would add $0.05+0.05$ to get the probability of making a Type I error for the joint test, right?

And it's my assumption that theoretically, this is what an F test does. So if you are running an F test on 20 variables, for example, then you are guaranteed, theoretically, to get a Type I error, right?

Or, now that I think about it, perhaps I am understanding Type I error incorrectly. Any help would be appreciated.

Best Answer

The null hypothesis for the F-test in ANOVAR, say, is that all the group means are equal. The alternative is that any are unequal. So by including more groups you could be reducing the power of the F-test* - it becomes harder to detect non-zero effects if you include more groups with zero effects - but not reducing its size - the significance level is controlled at the stated one.

[Edit: If you carry out twenty t-tests on twenty different treatments, you're right in thinking that the Type I error rate will be inflated, but wrong in thinking a Type I error will be guaranteed - if the tests are independent & the null hypothesis is true in each case, the chance of making at least one Type I error will be $1-(1-0.05)^20=0.64$.

When you carry out an F-test for ANOVAR you look at the treatment effects all together (the sum of their squares). The statistic you calculate follows an F-distribution under the null hypothesis that the sum of square treatment effects in the population is zero, which is only true when each single treatment effect is zero i.e. all group means are the same. (The derivation comes from Cochran's theorem - or you can easily investigate through simulation.)

*On reflection it seems wrong, or at least sloppy, to say the power of the F-test changes as you include more groups in an ANOVAR; you're estimating a new & different effect, not the same effect with a different power. Perhaps 'diluting the effect' would express the idea better.]

Related Question