Solved – Is adjusting p-values in a multiple regression for multiple comparisons a good idea

multiple regressionmultiple-comparisonsmultivariate analysispredictive-modelsregression

Lets assume you are a social science researcher/econometrician trying to find relevant predictors of demand for a service. You have 2 outcome/dependent variables describing the demand (using the service yes/no, and the number of occasions). You have 10 predictor/independent variables that could theoretically explain the demand (e.g., age, sex, income, price, race, etc). Running two separate multiple regressions will yield 20 coefficients estimations and their p-values. With enough independent variables in your regressions you would sooner or later find at least one variable with a statistically significant correlation between the dependent and independent variables.

My question: is it a good idea to correct the p-values for multiple tests if I want to include all independent variables in the regression? Any references to prior work are much appreciated.

Best Answer

It seems your question more generally addresses the problem of identifying good predictors. In this case, you should consider using some kind of penalized regression (methods dealing with variable or feature selection are relevant too), with e.g. L1, L2 (or a combination thereof, the so-called elasticnet) penalties (look for related questions on this site, or the R penalized and elasticnet package, among others).

Now, about correcting p-values for your regression coefficients (or equivalently your partial correlation coefficients) to protect against over-optimism (e.g. with Bonferroni or, better, step-down methods), it seems this would only be relevant if you are considering one model and seek those predictors that contribute a significant part of explained variance, that is if you don't perform model selection (with stepwise selection, or hierarchical testing). This article may be a good start: Bonferroni Adjustments in Tests for Regression Coefficients. Be aware that such correction won't protect you against multicollinearity issue, which affects the reported p-values.

Given your data, I would recommend using some kind of iterative model selection techniques. In R for instance, the stepAIC function allows to perform stepwise model selection by exact AIC. You can also estimate the relative importance of your predictors based on their contribution to $R^2$ using boostrap (see the relaimpo package). I think that reporting effect size measure or % of explained variance are more informative than p-value, especially in a confirmatory model.

It should be noted that stepwise approaches have also their drawbacks (e.g., Wald tests are not adapted to conditional hypothesis as induced by the stepwise procedure), or as indicated by Frank Harrell on R mailing, "stepwise variable selection based on AIC has all the problems of stepwise variable selection based on P-values. AIC is just a restatement of the P-Value" (but AIC remains useful if the set of predictors is already defined); a related question -- Is a variable significant in a linear regression model? -- raised interesting comments (@Rob, among others) about the use of AIC for variable selection. I append a couple of references at the end (including papers kindly provided by @Stephan); there is also a lot of other references on P.Mean.

Frank Harrell authored a book on Regression Modeling Strategy which includes a lot of discussion and advices around this problem (§4.3, pp. 56-60). He also developed efficient R routines to deal with generalized linear models (See the Design or rms packages). So, I think you definitely have to take a look at it (his handouts are available on his homepage).

References

  1. Whittingham, MJ, Stephens, P, Bradbury, RB, and Freckleton, RP (2006). Why do we still use stepwise modelling in ecology and behaviour? Journal of Animal Ecology, 75, 1182-1189.
  2. Austin, PC (2008). Bootstrap model selection had similar performance for selecting authentic and noise variables compared to backward variable elimination: a simulation study. Journal of Clinical Epidemiology, 61(10), 1009-1017.
  3. Austin, PC and Tu, JV (2004). Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. Journal of Clinical Epidemiology, 57, 1138–1146.
  4. Greenland, S (1994). Hierarchical regression for epidemiologic analyses of multiple exposures. Environmental Health Perspectives, 102(Suppl 8), 33–39.
  5. Greenland, S (2008). Multiple comparisons and association selection in general epidemiology. International Journal of Epidemiology, 37(3), 430-434.
  6. Beyene, J, Atenafu, EG, Hamid, JS, To, T, and Sung L (2009). Determining relative importance of variables in developing and validating predictive models. BMC Medical Research Methodology, 9, 64.
  7. Bursac, Z, Gauss, CH, Williams, DK, and Hosmer, DW (2008). Purposeful selection of variables in logistic regression. Source Code for Biology and Medicine, 3, 17.
  8. Brombin, C, Finos, L, and Salmaso, L (2007). Adjusting stepwise p-values in generalized linear models. International Conference on Multiple Comparison Procedures. -- see step.adj() in the R someMTP package.
  9. Wiegand, RE (2010). Performance of using multiple stepwise algorithms for variable selection. Statistics in Medicine, 29(15), 1647–1659.
  10. Moons KG, Donders AR, Steyerberg EW, and Harrell FE (2004). Penalized Maximum Likelihood Estimation to predict binary outcomes. Journal of Clinical Epidemiology, 57(12), 1262–1270.
  11. Tibshirani, R (1996). Regression shrinkage and selection via the lasso. Journal of The Royal Statistical Society B, 58(1), 267–288.
  12. Efron, B, Hastie, T, Johnstone, I, and Tibshirani, R (2004). Least Angle Regression. Annals of Statistics, 32(2), 407-499.
  13. Flom, PL and Cassell, DL (2007). Stopping Stepwise: Why stepwise and similar selection methods are bad, and what you should use. NESUG 2007 Proceedings.
  14. Shtatland, E.S., Cain, E., and Barton, M.B. (2001). The perils of stepwise logistic regression and how to escape them using information criteria and the Output Delivery System. SUGI 26 Proceedings (pp. 222–226).
Related Question