Solved – Logistic regression coefficient too high – cannot interpret odds ratio

interpretationlogisticodds-ratioregressionregression-strategies

Question of the day: I'm running a logistic regression (results below), and I come across a coefficient that is insanely large (in absolute terms). Usually, we don't care about things like that because it's simply a question of scale – if it's really bothering you – multiply this column by 1000 and the coefficient will become -2.7. But if it's not bothering you – leave it.

However, today I needed to investigate the impact of the IV's on the odds ratio. As usual, you take the exponent of each coefficient and that will be the multiplying effect on your odds in the case of a 1-unit change in the underlying IV. In this case, since the coefficient is so negative exp(coeff) = 0

Questions:

  1. Does this mean that we can no longer not care about scale and need to always aim for the coefficients to be somewhere between -5 and 5 tops? (if we need to be able to interpret effect on the odds-ratio)

  2. [The important one] Say, I want to give somebody simplistic advice on how to behave in this situation. The obvious answer is: fix the scale of that specific column. But is there a more general approach? Is there something they can do to never end up in this situation? I'll specify the question: what is the best step-by-step way of interpreting coefficients (in terms of impact on odds-ratio) that guarantees you won't have problems with exp(beta) being 0 or too high?

  3. The IV in question, spendratio2, is the ratio of monthly credit card expenditure to yearly income. It ranges from 0.0001 to 0.9 with a mean of 0.068 and stddev of 0.094. Why I bring this up is that it's not kilometres or kilograms, and multiplying a ratio by 1000 might make it hard to understand and explain to business users. What are your suggestions for changing scale of ratios?

Thank you,

Kind regards,

Kirill Eremenko

Logistic Regression Coeff Too High

Best Answer

One likely explanation which has not been offered is that you have quasi-separation. For nearly all the values of spendratio2 above a threshold cardhldr has the value 1 and for nearly all the values below a threshold, not necessarily the same it has the value 0 (or vice versa). Plotting the proportion of cardhldr for a number of categories of spendratio2 should clarify whether I am right.