Multiple Testing – Alpha Adjustment for Multiple Testing

hypothesis testingmultiple-comparisons

I understand the logic of alpha adjustment for multiple testing. However, I am confused as to whether this correction should be applied to all tests on a dataset or only the pairwise comparison in question.

For example, I have four pairwise comparisons (male vesus female, married versus other, English versus other languages, young versus old). They are from the same sample (dataset).

If I use 0.05 and Bonferroni, should my corrected alpha be 0.025 (i.e. calculated for each pairwise comparison) or 0.0125 (i.e. calculated for four pairwise comparisons in total). How does the concept of familywise error fit in here?

I must add that my interest is in each pairwise comparison (e.g. male compared to female) and NOT across the different pairs (e.g. married male versus married female).

Best Answer

@John has a nice answer. I particularly like the discussion about fishing expeditions and how alpha-adjustment may not be necessary. I want to add one additional aspect to this discussion. With hypothesis testing, there are two different kinds of errors to worry about: type I and type II (also called alpha error and beta error). Both kinds are bad, and we want to avoid both of them. When people talk about alpha-adjustment, they are focusing only on the possibility of type I errors (that is, saying there is a difference when there isn't one). However, adjusting alpha to minimize type I errors necessarily decreases power. Thus, it necessarily increases the probability of type II errors (that is, saying there isn't a difference when in fact there is). In addition, it's worth noting that a-priori there is no reason to believe that type I errors are worse than type II errors (despite the fact that everyone seems to assume that this must be so). Rather, which is worse will vary from situation to situation and is a judgment that must be made by the researcher. In other words, deciding on a strategy for testing multiple comparisons (e.g., an alpha-adjustment strategy) one must consider the effect of the strategy on both type I and type II errors and balance these effects relative to: the severity of these errors, how much data you have, and the cost of gathering more.

On a different note, from your description it seems to me that your situation would best be analyzed by using a factorial ANOVA, with sex as factor 1, marital status as factor 2, language as factor 3, and age as factor 4. From the description (and I recognize that it is sparse) I don't see why a cell means approach (i.e., one-way ANOVA) is preferable. If you have no interest in interactions, the main effects from the factorial ANOVA are already orthogonal (at least if the $n$s are the same), and Bonferroni corrections are not relevant. Certainly it would still be possible to have more than 5% type I errors, but I'm a big believer in @John's fourth paragraph; when I'm testing theoretically suggested, a-priori, orthogonal contrasts, I don't use alpha-adjustments.