Solved – Binary logistic regression with 3 similar outcomes

logisticmultivariate analysis

I was given three binary dependent variables, which are the following:

1) Paid in Full (Subject paid their full balance)
2) Settled in Full (Subject paid 80% or more of their balance in 1, 3, or 5 payments)
3) Rehab (Subject made 10 or more payments)

With these three binary dependent variables, the goal of this project is to build a predictive model to predict the odds of Paid in Full (PIF), Settled in Full (SIF), or Rehab. I have 250k records and ~500 potential predictor variables to work with.

My question is how to best capture these 3 types of payers. I was told to simply combine all three dependent variables into one dependent variable, where: Paid = 1 (PIF, SIF, or Rehab) and 0 = Not Paid (All other records), and use the pool of predictive variables to predict this single outcome using binary logistic regression. However, I do not believe this is the best approach as the factors that result in PIF likely differ from factors that result in SIF and/or Rehab.

Is there a better way to model this data than using only the single outcome variable/binary logistic regression?

Any references or explanations are greatly appreciated!

Best Answer

It sounds like multinomial logit should work. Your states are competing risks. Look at mlogit command in stata.

UPDATE: Here's the paper with an example of application of mlogit model Appendix A: Econometric Analysis of Mortgages

I think that Begg and Gray (1984) were the first to use in this setup, and the paper is referenced from the link above. As long as your states are mutually exclusive and exhaustive, this should work.

Related Question