Hypothesis Testing – Are Multiple Comparisons Corrections Necessary for Informal Comparisons?

hypothesis testingmultiple-comparisons

I have a sort of philosophical question about when multiple comparison correction is necessary.

I am measuring a continuous time varying signal (at discrete time points). Separate events take place from time to time and I would like to establish if these events have a significant effect on the measured signal.

So I can take the mean signal that follows an event, and usually I can see some effect there with a certain peak. If I choose the time of that peak and do say a t-test to determine if it is significant vs when the event doesn't occur do I need to do multiple comparison correction?

Although I only ever performed one t-test (calculated 1 value), in my initial visual inspection I selected for the one with the largest potential effect from the (say) 15 different post delay time points I plotted. So do I need to do multiple comparison correction for those 15 tests I never performed?

If I didn't use visual inspection, but just did the test at each event lag and choose the highest one, I surely would need to correct. I am just a little confused as to whether I do need to or not if the 'best delay' selection is made by some other criterion than the test itself (e.g. visual selection, highest mean etc.)

Best Answer

Technically, when you do a visual preselection of where to do the test, you should already correct for that: your eyes and brain already bypass some uncertainties in the data, that you don't account for if you simply do the test at that point.

Imagine that your 'peak' is really a plateau, and you hand pick the 'peak' difference, then run a test on that and it turns out barely significant. If you were to run the test slightly more to the left or to the right, the result could change. In this way, you have to account for the process of preselection: you don't have quite the certainty that you state! You are using the data to do the selection, so you are effectively using the same information twice.

Of course, in practice, it is very hard to account for something like a handpicking process, but that doesn't mean you shouldn't (or at least take/state the resulting confidence intervals / test results with a grain of salt).

Conclusion: you should always correct for multiple comparisons if you do multiple comparisons, regardless of how you selected those comparisons. If they weren't picked before seeing the data, you should correct for that in addition.

Note: an alternative to correcting for manual preselection (e.g. when it is practically impossible) is probably to state your results so that they obviously contain reference to the manual selection. But that is not 'reproducible research', I guess.

Related Question