I would like to test a collinearity between possible "predictors (risk factors)" for binary outcome (death).
Possible "predictors" are categorical (always binary) and continuous…
- For two continuous (age, weighth, etc.), I can use a bivariate correlation, i.e. Pearson or Spearman, right?
- How to deal with categorical (binary) vs. categorical (binary)? Would be a Chi-Square-Test or Fischer´s exact test correct? With p-value over 0,05 meaning there is no difference? Or would it be totally incorrect?
- How to deal with categorical (binary) vs. continuous? E.g., as potential predictors premature birth (binary categorical) and birth weight (continuous) – it´s clear they are somehow correlated with each other (premature birth means almost always lower birth weight).
I would like to use binary logistic regression as a final multivariate model …
I am using SPSS 22.0
Best Answer
I would say you're on the right track.
For continuous ~ continuous data, you can use the Pearson correlation coefficient to determine strength of linear correlation, otherwise, I suggest the Spearman correlation as it is nonparametric.
For binary ~ binary (categorical), a Chi-Squared test will provide you with strength of association, but may still want to include these if you are trying build a model then later on remove one as you refine your model.
For binary ~ continuous, you can try the biserial correlation. SPSS has great documentation for this: https://statistics.laerd.com/spss-tutorials/point-biserial-correlation-using-spss-statistics.php
As you thin down your models, I suggest using the AIC and BIC to compare models, as well as Chi-square test for the deviance residuals. This should assist you in deciding which variables to keep or remove.
Hope this helps.