Solved – Exploratory Factor Analysis for Binary Logistic Regression Variable Selection

feature selectionlogisticrregressionsas

I have a great interest in learning new methods(at least to me) of variable selection in regards to binary logistic regression when I am working with over 500 potential predictor variables and have the duty of selecting 8 to 15 variables to build a parsimonious predictive model without using the notorious stepwise techniques.

With that being said, I was wondering if anyone has any experience using proc factor for binary logistic regression variable selection? I assume my factors will correlate, and thus I will use promax rotation, but with the results of the Exploratory Factor Analysis (EFA), I will simply retain the variable within each factor that has the highest loading on its own factor (latent variables models would confuse the hell out of the end-user of 99.999% of my models!) for further variable reduction through another technique such as randomForest until the number of variables is small enough to build a model that has fewer than 15 variables in it.

Does anyone have any thoughts in regards to this process? Any suggested readings or input would be greatly appreciated. Thanks!

Best Answer

One of the guiding principles of exploratory factor analysis is to extract meaningful factors. You should therefore be able to run EFA, discover patterns among your predictor variables, and extract meaningful latent variables that your "end-users" could understand. If EFA is too much for them, tell them that you added together x1, x2, and x3 to create something called "interesting factor name" and used that to predict your outcome. The additive scales will correlate at .99 with the actual factor scores anyway.

Also, you should explore the dimensionality of the data - bearing in mind the need to extract meaningful factors. Check out R functions like VSS().