I never quite understood the rationale behind the Bonferroni correction. I understand that we want to reduce the FDR, but it seems odd to penalise a test based on the existence of another test, given that we do not even know if we have complete information about all possible test.

As an example, say we perform some test on time series data (let's say, the Mann Kendall test). Let's say we have temperature data about New York from 1900 to 2020. If we perform the test for the full dataset, from 1990 to 2000, that gives us one test. We find this is significant. We could now perform another test, but with a subset, using data from 1950 to 2020. We also find this is significant, but we now apply a Bonferroni correction. We now perform another test, from 1990 to 2020, and now the original test (1900-2020) is no longer significant from the correction. We could continue like this until we have no significant results, even though we haven't changed the raw data.

## Best Answer

This is why it's important to have to have a statistical analysis plan, rather than just conducting hypothesis tests on an ad-hoc basis. It's usually good practice to define the hypothesis tests you want to run up front before actually running any of them. You don't need to know the universe of "all possible tests", you just need to know the universe of "tests you're actually running".

It's reasonable to conduct a tiered analysis where different numbers of hypotheses are considered based on domain knowledge. If you have a strong reason to believe there is an effect in some period of time and run a hypothesis test

onlyin that period of time, a result of p<0.05 suggests a true effect. You may then do a more exploratory analysis and run a thousand hypothesis tests over many periods of time, but by the sheer number of tests, it's extremely likely that some of them will have a nominal p<0.05. This new correction factor doesn't invalidate your original finding, since you did run an analysis with one single hypothesis originally. Prior knowledge can be important, as it can lead you to meaningful findings that don't get corrected out by irrelevant statistical tests where you didn't really expect a positive finding anyway.Suppose I have a 1000 coins and you want to find out if any are biased. If you pick one single coin and get 5 heads in a row, that's rather unexpected and reasonable evidence that the coin is not fair. But if you flip all 1000 coins, it's almost certain that many will produce heads 5 times in a row. This is where honesty in data analysis becomes important, as it would be disingenuous to flip all 1000 coins and then pretend you only flipped the coins that showed p<0.05. In all cases, the raw data remains the same, but how and why you choose to analyze it will affect your interpretation of it.