Solved – How to identify which predictors should be included in a multiple regression

correlationmultiple regression

I am not a statistician, but a medical researcher and I have 5 outcomes that I want to identify independent predictor(s) for each using multiple regression. I have many potential variables that could be included in the multiple regression as independent variables (IV).

One colleague advises to run Spearman correlation matrix between all IV and DV, then to include only the significantly correlated IV in the multiple regression.

Questions

  • Is it appropriate to include only significant predictors with significant bivariate spearman correlations with the outcome?
  • Alternatively, what is a good way to determine inclusion of predictors in a multiple regression?

Best Answer

The model should be formulated by subject matter expertise. It is not a good idea to use the data to tell you which data to use. The data are not information-rich enough to be able to reliably do this. Should you have too many events per variable (one rule of thumb is to have at least 15 subjects per parameter in the model), strongly consider data reduction methods that are blinded to $Y$. These include principal components, variable clustering, and redundancy analysis. Examples are in my course notes at http://biostat.mc.vanderbilt.edu/CourseBios330.

Related Question