Solved – Linear mixed effects model and – multiplicity issue and adjusting for p-values

aicmixed modelmultiple-comparisons

In our randomized controlled trial, we used linear mixed effects models to test differences between groups in changes from baseline to six months while adjusting for important covariates. We ran separate analyses for each outcome. We had 6 covariates that were considered, and we used a stepwise method using Akaike information criterion for selection of the best variable set.

A reviewer came back to us saying that we should have addressed multiplicity and adjusted our p-values due to a possibility of inflating type I error.

I am not sure if this is true or not, but I don't think that we should do this, because we did not conduct post hoc analysis. Also, in our analysis we did not have multiple levels of treatment. So, I don't really think we should adjust our p-values. The only thing that we should adjust for, I believe, is for selecting the covariates. But, that was taken care of by using iterative model selection technique, namely stepwise variable selection based on AIC (stepAIC).

P.S. I searched the site for possible answers, and couldn't find any that fits.

Best Answer

Since the reviewer only seems to be concerned about the two outcomes measured on the same subjects (and did not question the modeling procedure itself), I would simply use a sequential Bonferroni adjustment (a.k.a. Holm-Bonferroni method) to correct for it.

  1. Sort your $p$-values in ascending order
  2. Refer to them as $p_i$ (i.e. $p_1, p_2, p_3$, etc.)
  3. Than you adjust your $\alpha$-level and compare the $p$-values against that new $\alpha$-levels, i.e. you test whether $p_i \le \alpha / (1 + k - i)$, where $k$ is the number of statistical tests conducted, i.e. the number of $p$-values calculated. You can stop when $p_i \gt \alpha / (1 + k - i)$. Those $p_i$ that fall below the sequentially adjusted $\alpha$-levels are now your significant tests which are adjusted for multiplicity (after the Holm-Bonferroni method).

For example you conducted five tests ($\alpha = 0.05$) resulting in the following $p$-values:

$p_1 = 0.0024, p_2 = 0.0084, p_3 = 0.019, p_4 = 0.027, p_5 = 0.12$

The new $\alpha$-level you compare $p_1$ against is:

$0.05/(1+5-1) = 0.01$

Since $p_1 \le 0.01$ you can move on to $p_2$:

$0.05/(1+5-2) = 0.0125$

Since $p_2 \le 0.0125$ you can move on to $p_3$:

$0.05/(1+5-3) = 0.0167$

Since $p_3 \gt 0.0167$ you can stop.

In this case, from initially four significant $p$-values, you now only have two but those are adjusted for multiplicity (Note: Instead of adjusting the $\alpha$-levels, you can also adjust the $p$-values and compare against your chosen $\alpha$-level (e.g. $\alpha = 0.05$). Then all you need to do is $(1 + k - i)*p_i$ instead).

See also:

Abdi, H. (2010). Holm’s sequential Bonferroni procedure. Encyclopedia of research design, 1.

Peres-Neto, P. R. (1999). How many statistical tests are too many? The problem of conducting multiple ecological inferences revisited. Marine Ecology Progress Series, 176, 303-306.

Alternatively, you could also argue that you don't want to adjust for multiplicity because of reason such as being concerned with making type II errors.

See here: Feise, R. J. (2002). Do multiple outcome measures require p-value adjustment?. BMC Medical Research Methodology, 2(1), 1.

Or maybe this one: Gelman, A., Hill, J., & Yajima, M. (2012). Why we (usually) don't have to worry about multiple comparisons. Journal of Research on Educational Effectiveness, 5(2), 189-211.