I am working on Sales data. i have binary variable win/loss the opportunities and rest are the activities done by sales force (sales guys) with 40+ variables (different types of activities done for the Opportunity)
I build the logistic model on the available data-set, and i found huge VIF value for different Xi's, then i perform stepwise variable reduction procedure for getting less variable in my model. At the end of this process i got 15 indep variable with dependent variable
Again i build same model on new data-set and again i m getting high VIF around(5610,3374.020669,3270.561737,2.512324,9.922235,…… etc.) for each variable
If u will look at the pairs graph and coefficient result please refer attached pic
Please suggest me what should I do further and how to come with my actual model with less error?
I am really stuck for further conclusion.
Best Answer
A good approach to reduce the dimension of the feature space in regression is partial least-square regression, which finds factors which are both good at explaining the variance in the feature space, but also at predicting the variable of interest.
With a few tweaks, this approach can be used for logistic regression too. For a discussion, see this paper, or this one.