Solved – Starting Steps for building logistic regression model

correlationlogisticregressionstatistical significance

I have a categorical response variable. It is binary and represents the win or loss of a deal. Some of the independent variables used to predict the response are also categorical (like Geo, Region, and others…). These categorical variables have more than 3 categories. The rest of the variable are counts (like #face2face activities, #of CXO/VP meet, Business development activities).

Should I use logistic regression to predict the response variable? If yes, please specify the steps needed to come up with the best model for prediction.

How should I check the quality of the model, so as to decide which one is the best?

Best Answer

You may want to look into the field of statistical classification, since you are unsure whether to use logistic regression to predict the response variable. Classification is a related field - some sources, including Wikipedia, regard classification as the field encompassing logistic regression.

To handle the categorical variables, use a suitable encoding that transforms them into numbers, and consider normalizing all data features: most classification algorithms require this in order to avoid the counts with highest maxima to dominate the result.

In order to tell which algorithm works best, or to tune the parameter values once you have one algorithm selected, use cross validation on the data you already have.