Solved – How to remove multicollinearity from the logistic regression model

data mininglogisticlogitrregression

I am working on Sales data. i have binary variable win/loss the opportunities and rest are the activities done by sales force (sales guys) with 40+ variables (different types of activities done for the Opportunity)

I build the logistic model on the available data-set, and i found huge VIF value for different Xi's, then i perform stepwise variable reduction procedure for getting less variable in my model. At the end of this process i got 15 indep variable with dependent variable

Again i build same model on new data-set and again i m getting high VIF around(5610,3374.020669,3270.561737,2.512324,9.922235,…… etc.) for each variable

If u will look at the pairs graph and coefficient result please refer attached picPaire Graph

Result

Please suggest me what should I do further and how to come with my actual model with less error?

I am really stuck for further conclusion.

Best Answer

A good approach to reduce the dimension of the feature space in regression is partial least-square regression, which finds factors which are both good at explaining the variance in the feature space, but also at predicting the variable of interest.

With a few tweaks, this approach can be used for logistic regression too. For a discussion, see this paper, or this one.

Related Question