Solved – Multinomial logistic regression, weighted logistic regression

binninglogisticregressionstatsmodels

I have a binary predictor with many response variables. The binary predictor was originally continuous but was converted to binary … if the response was $>1000$ then 1, else 0. I would like to have a model in which responses of greater magnitude are more likely be to 1 vs responses of lesser magnitude. I have also thought of splitting the response to more categories … anyone have any ideas?

Best Answer

Whatever the problem is, you should not be binning a continuous response. You didn't give us much context, so advice is difficult to give, please add more context. But you say "there are a lot of legitimate 0 values." Why is that a problem? Maybe because you wanted to log-transform an otherwise positive variable? Then there are many other options, for instance modeling $\log( Y+c)$ for some positive constant $c$ (which could be estimated from that in a way similar to Box-Cox transforms). Or an extended Box-Cox transform of the form $\frac{(Y+c)^\lambda+1}{\lambda}$ Can be used, see Wikipedia or Transforming variables for multiple regression in R. Or you could simply use a glm (generalized linear model) with log link. That log-transforms the (estimated) expectation, not the observations! Many other possibilities, but tell us more about context first.

One other possibility that merits mention (since it is not too well known) is continuous ordinal regression. This is implemented in R in orm in package rms, and discussed at length in Frank Harrell's book "Regression Modeling Strategies".