I have a big data matrix with 6000 rows (observations) and 45 columns (44 predictive variables (categorical and continuous) and 1 response variable (0 or 1). I want to check the correlation/ multicollinearity in R. I have looked into cor()
and heat map so far, but it seems like for a big data I need to use something else. Please advice.
Solved – Collinearity in R for dataset with 40+ variables
large datamatrixmodelingmulticollinearityr
Best Answer
I also like VIF's, but another way would be to estimate the mutual information between/among the various predictors as it isn't concerned solely with a linear relationship. The idea is to only use those covariates with low mutual information as they are telling you something different. Check out the
infotheo
orentropy
pkgs.