Solved – About the Bonferroni correction

bonferronimultiple-comparisonsstatistical significance

I have 50 variables in my dataset. I have correlated each variable against each variable, thus I have $49·50/2 = 1225$ unique correlations when no variable is tested against itself. Now, suppose I want to correct the statistics for multiple comparison, and that for some reason (well, just for learning now) I want to use the Bonferroni correction. Let the threshold for significance be $\alpha = 0.05$. Is the corrected threshold $0.05/1200$ (where 1225 is number of tests) or $0.05/49$ (because each variable was correlated with 49 other)?

Best Answer

With Bonferroni correction the divisor is equal to the number of tests you carry out, dependent or independent

It helps to understand the purpose of the Bonferroni correction. You are testing correlations between variables. Let’s assume your null hypothesis for any two variables is that the correlation is 0. (Any null hypothesis will suffice.) Your significance threshold is $\alpha = 0.05$. In other words, there is a 5% chance that you will reject the null hypothesis erroneously. This is known as a type 1 error or a false positive.

Now, let’s say you did 100 tests, all at the $\alpha = 0.05$ level. You would expect $5%$ of these to give a false positive ( ie to fail by chance alone). If you do 1225 tests then you expect 5% = ~61 false positives. This is quite a lot! Bonferroni offers a level of protection against this scenario. You can think of it as familywise protection as it offers a family of tests a single level of protection against even one false positive. Instead of testing with an $\alpha = 0.05$ threshold, you perform each test at the $\alpha = 0.5/1225 = ~.0000408$ threshold. In this case Bonferroni reduces the probability of even one false positive amongst all tests to an $\alpha = 0.05$ threshold.

Bonferonni is very conservative and there exists several improvements. My favourite is False Discovery Rate

Related Question