Solved – Should a predictor, significant on its own but not with other predictors, be included in an overall multinomial logistic regression

logisticmultinomial-distributionpredictorregression coefficientsstatistical significance

I constructed a model via multinominal logistic regression analysis. The final model contains three predictors. All predictors are significant when they are the only predictors. However, the coefficient of one of the predictors is not significant when included all three predictors are included in the model.

Should I include this predictor in the final multinomial logistic regression equation?

Best Answer

It depends whether you are doing... a) predictive research, where you don't care about what is causally responsible, only what serves as an efficient set of indicators, or b) explanatory research, where you want to disentangle causal relationships as much as you can.

In the latter, when multiple correlated predictors vie for a role in your equation, you would care about such things as giving "causal credit" to earlier factors over later ones, since what comes later could never cause what came before, but sometimes the reverse is true. You would care about giving more "credit" to relatively objective, relatively fixed variables such as marital status or ethnicity than to relatively subjective, changeable ones such as attitudes and opinions. And (and here I'm paraphrasing James Davis's The Logic of Causal Order) you would want to choose more generative factors such as socioeconomic status over less generative ones such as what brand of toothpaste a person uses.

When your candidate predictors are correlated, no statistical algorithm (such as a stepwise regression) can deal with these issues of explanation. It is up to you as a researcher to think through your candidate variables and choose those that will best serve your purpose. It is only in pure predictive research that you can ignore such issues and simply choose those predictors that account for the most variance in the outcome--or, in your case, produce the highest pseudo-r-squared.

Your question gets to the heart of important issues in multivariate modelling of many types, and if more than 5 tags were allowed I would have also listed multicollinearity, model-building, and/or variable selection.