Solved – Correction of p-values for multiple regression models with multiple comparisons

bonferronimultiple regressionmultiple-comparisons

I was asked to do a correction of p-values for my analyses, but I'm not sure what p-values I am to correct, and presuming I'm using Bonferroni, what is the number that I am dividing the .05 level with. I have done three sets of multiple regression analyses, each with seven different time points for a total of 21 models. The method was backwards regression, independent variables remaining in the model at p < .05. The sets are comprised of the same group of independent variables entered, but the dependent variables differ (3 total x 7 time points). Due to the nature of backwards regression, obviously the final predictors in the model differ. Just to make the case clearer, by seven time points I mean the same variables have been measured at seven time points (i.e. A1, B1 to X1; A2, B2 to X2 etc).

So, what I really want to know is what is the correct way to go about this. Do I need to adjust the model ANOVA's p-values printed by SPSS, or the individual regression coefficients? Why this baffles me is the fact that in essence, a given time point's (potential) predictors are used for three different analyses. Then again, there are 7 actual regression analyses done for each set. This is of importance as some of the individual regression coefficients are close to .05 and if corrected, won't remain significant, whereas all the models even if corrected with what I think is the right level of alpha here, will remain significant.

Thank you in advance.

Best Answer

I can see three related issues:

A) What was your research question? Was there a main outcome and a clearly formulated hypothesis before you gathered data? The number of outcomes and predictors suggests not. If you are performing an exploratory analysis, say so in your report, refrain from making any strong conclusions, forget about p-value corrections, select the best outcome & model, and do a second experiment to replicate.

B) More importantly: It is commonly accepted nowadays that stepwise (forward/backward) regression is bad practice (e.g. see this, this, and especially this post). It will almost certainly lead you to false conclusions and your reviewers/referees should not let you pass. If you want to do efficient model selection, there are other techniques to do so (e.g. LASSO). If you want to do inference (i.e. make claims about your independent variables), go for the full model justified by the theory and stick with it (assuming your sample size is adequate for the number of predictors involved). If you insisted on the stepwise approach, you would not only have to correct for the obvious numbers of multiple comparisons (number of outcomes (3) * timepoints (7) = 21) , but also for all of the multiple comparisons hidden in your backwards regression (i.e. all possible combinations of your predictors, lets call this figure X). To do so in a Bonferroni fashion, just multiply your p-values obtained in SPSS with 3*7*X ... before comparing with your threshold good luck with that.

C) As far as I can see from your description you have a longitudinal research design, where you obtain repeated observations from the same experimental units. You then perform separate regressions (construct separate statistical models) for each time point. You can greatly reduce the number of multiple comparisons in your analysis by performing a single mixed model analysis for each outcome. However, this is not trivial and you may need help irl to do so. Additionally, I can recommend:

West, Brady T., Kathleen B. Welch, and Andrzej T. Galecki. Linear mixed models: a practical guide using statistical software. CRC Press, 2014.

and

Gelman, Andrew, and Jennifer Hill. Data analysis using regression and multilevelhierarchical models. Vol. 1. New York, NY, USA: Cambridge University Press, 2007.

Related Question