Solved – Using multinomial logistic regression for multiple related outcomes

logisticmultinomial-distributionmultivariate regression

Is it common practice (and adequate) to regroup two binary dependant variables into a single 4-level dependent variable to take advantage of the multinomial regression? For instance, say we have information on two related conditions (outcomes) A and B. A new 4-category variable would be defined such that:

category 1 = Neither conditions A nor B
category 2 = Condition A (only)
category 3 = Condition B (only)
category 4 = Both conditions A and B

This allows running a single multinomial regression instead of using two binary logistic models that include the same predictors.

Best Answer

As @Riaz Rizvi suggests, this may not be a good idea.

Your scheme enforces a particular (and rather unlikely) covariance structure on the problem by flattening to a multinomial this way. Since you suspect, or at least wish to allow the possibility that the presence of A is informative of B, then you should be working with a bivariate probit. Working with two separate logistic models is not going to be able to represent this. The model is a regression with an explicit correlated bivariate latent variable generating the choice probabilities, as discussed briefly in the link and at greater length in good econometrics texts.

Related Question