Solved – Exploratory data analysis for a dataset with continuous and categorical variables

anovacorrelationexploratory-data-analysisrregression

I have a data set which has DV and around 40 IVs. I want to select best variables out of the existing ones. I can use correlation, but it requires only numeric variables. I would like to see relation between continuous and categorical variables.

What method should I use for variable selection, which can handle both continuous and categorical variables (including relation between them)? I am using R as a modeling tool. Also, is it advisable to convert continuous variables into categorical variables for better results?

Best Answer

First of all, it is possible to calculate correlation for both continuous and categorical variables, as long as the latter ones are ordered. This type of correlation is referred to as polychoric correlation.

In order to calculate polychoric correlation, since you plan to use R, you have, at least, two options: 1) psych package offers polychoric() and related functions (http://www.personality-project.org/r/psych/help/tetrachor.html); 2) package polycor offers hetcor() function. Analysis of models, containing ordered categorical (ordinal) variables, include some other methods, including, but not limited to, numeric recoding, ordinal regression and latent variables approach.