Hypothesis Testing – Should Adjustments for Multiple Comparisons Be Made for Completely Different Hypotheses?

bonferronihypothesis testingmultiple-comparisonsstatistical significance

A typical reality: a statistical journal asks me (I cannot refuse or won't get published) to summarize several baseline characteristics and compare them via test across 2 groups, G1 and G2, namely:

  • age at G1 and G2
  • gender % at G1 and G2
  • some clinical parameter at G1 and G2
  • % of co-morbidities at G1 and G2
  • … 10 additional measures at G1 and G2

Neither of the above comparisons share the same data.

In other words, it's not like testing for pairwise differences: baseline-time1, baseline-time2, baseline-time3, time1-time2, time1-time3, time2-time3 and so on, where we naturally adjust the significance level (here using the Dunnett or Tukey, for instance).

It's just a set of tests for a set of very different outcomes.

So I get:

Sex:
    males/females: G1=xx%, G2=xx%
    diff=xx, 95% CI= [xx - xx]
    p-value = 0.xx
    effect size ABC = xx, small

Age:
    mean: G1=xx.x, G2=xx.x
    diff=xx.x, 95% CI= [xx.x - xx.x]
    p-value = 0.xx
    effect size ABC = xx, moderate

and so on.

The reported p-values and CIs are unadjusted here.

In my opinion this summary should be WITHOUT any statistical inference, as it just summarizes the "local" data from the experiment, but the journals demands me to add this, so I am going to add it. Let's not question that and focus on my problem.

Should I leave the p-values unadjusted, because every comparison is about something very different, no "p-catching",
or
adjust them for something. But for what?

Having 15 parameters, I'd need to adjust 0.05/15 = 0.003(3), assuming the simplest Bonferroni adjustment (I could also use the Holm more power, I guess).

But is this really necessary in THIS very case?

Please note, I'm aware of the fallacy saying that "one doesn't have to adjust for orthogonal hypotheses or planned comparisons". Here it's not an orthogonal contrast (like effect coding or Helmert), it's just a set of different questions.

The goal of the summary is not to find "at least rejected hypothesis in the experiment" (because the experiment has it's own primary objective), rather to describe the "homogeneity" of groups, to say: "OK, the two groups were homogeneous in terms of age and sex", which sets the context for further analyses.

Best Answer

Welcome to SE!

I agree with you that p-values in such tables are rarely a good idea (but often requested by journals). Usually, the hypothesis tests presented for baseline characteristics are not adjusted p-values. However, non-significant p-values are preferred since you want your groups not to differ at baseline and therefore, using the Bonferroni correction is actually advantageous for you. So, if you follow tradition, you do not need to adjust but if you do, it will not cause any harm to your study.