Solved – Collinearity testing between predictors

multicollinearity

I would like to test a collinearity between possible "predictors (risk factors)" for binary outcome (death).

Possible "predictors" are categorical (always binary) and continuous…

  1. For two continuous (age, weighth, etc.), I can use a bivariate correlation, i.e. Pearson or Spearman, right?
  2. How to deal with categorical (binary) vs. categorical (binary)? Would be a Chi-Square-Test or Fischer´s exact test correct? With p-value over 0,05 meaning there is no difference? Or would it be totally incorrect?
  3. How to deal with categorical (binary) vs. continuous? E.g., as potential predictors premature birth (binary categorical) and birth weight (continuous) – it´s clear they are somehow correlated with each other (premature birth means almost always lower birth weight).

I would like to use binary logistic regression as a final multivariate model …

I am using SPSS 22.0

Best Answer

I would say you're on the right track.

For continuous ~ continuous data, you can use the Pearson correlation coefficient to determine strength of linear correlation, otherwise, I suggest the Spearman correlation as it is nonparametric.

For binary ~ binary (categorical), a Chi-Squared test will provide you with strength of association, but may still want to include these if you are trying build a model then later on remove one as you refine your model.

For binary ~ continuous, you can try the biserial correlation. SPSS has great documentation for this: https://statistics.laerd.com/spss-tutorials/point-biserial-correlation-using-spss-statistics.php

As you thin down your models, I suggest using the AIC and BIC to compare models, as well as Chi-square test for the deviance residuals. This should assist you in deciding which variables to keep or remove.

Hope this helps.

Related Question