Solved – Is it good practice to adjust for multiple comparisons when performing different multivariable regression models

multiple regressionmultiple-comparisonsp-value

I have five different outcomes and the same group of independent variables which have been used for the five different multivariable Cox regression models. The indipendent variables I have used in the multivariable models are part of a pretty standard choice of variables.

I used to think that multiple comparison adjustment (i.e. Bonferroni correction) was a standard procedure when performing univariate analyses but this could be skipped when performing multivariable models, especially when the choice of independent variables is pretty standard. However, by discussing this with a colleague, he pointed out that a multiple comparison adjustment should be performed anyway (I replied that a cross validation approach using two sub-sets would be a better way to prove the significance).

In my case I perform 5 different models (using five different outcomes) and 3 out of 5 are significant, but only one p value is <0.01 (the other two around 0.02). If applying the multiple comparison adjustment I should completely change my view on the results. What do you think?

Best Answer

Firstly, when you perform multiple hypothesis tests (as you do by looking at whether p-values from multiple outcomes), there is in principle a multiplicity issue in the sense that comparing each p-value versus the level $\alpha$ will result in a familywise type I error rate $>\alpha$. I do not think this really goes away, if you do cross-validation. Whether you need to control the familywise type I error rate and across which analyses is a complicated issue. E.g. if you write two separate papers on the same dataset, do you get twice the familywise type I error rate, but not if you put the results in the same paper? This is really only (relatively) clear in a few settings such as confirmatory clinical trials for getting regulatory approval for a drug.

Secondly, the practical reason why many people are keen on adjustments is, because many people take a "many shots on target" approach, where they do lots of comparisons and then emphasize those with an unadjusted p-value <0.05. It is clear that when people study a lot of things including a huge number of things that really do not affect the outcomes being studied, that this will fill the scientific literature with many purported findings that are just random noise. This only gets worse when there are many small decisions left open until the data has been collected, which may lead to the potential for the choices in the analyses being data dependent. It may be debatable whether multiplicity adjustments help that much for such issues, but I guess I am not alone in trusting results with p<0.05 less when (a) the study was not pre-registered with outcomes and analyses pre-specified, (b) lots of outcomes were studied, (c) no adjustment was made for multiplicity and of course (d) the claimed effect is not a-priori plausible.

Thirdly, you should not completely change your view of results just because p=0.02 or p is just over 0.05. The former is not completely compelling evidence (and I would not get too excited about it) and the latter does not mean that the hypothesized effect is not there. Of course, this may affect what journal editors and reviewers will let you write (and whether they get excited about your paper) so in practical terms it may sadly be a major difference.