Solved – explaining an extremely large coefficient in a rare events logistic regression

logisticrare-events

I am running a rare events logistic regression on a binary dependent variable. I have 538 observations and only 10 events (so 528 values of 0 and 10 of 1), which is why I chose to use a rare events logistic regression.

When I run the regression, one of the independent variables in the model has a huge coefficient (around 25,000,000) and is found to be significant. The range on the independent variable is 0 to 1. Is this a problem? Could anyone explain why this is happening?

When I run the same model with just a logistic regression this variable is insignificant.

I'm not sure what is happening. Any advice would be appreciated.

Best Answer

In all likelihood, you have a poorly diagnosed complete separation / perfect prediction in your model: a combination of the explanatory variables (if you used interactions), or more likely a single explanatory variable, uniquely identifies one of the rare events. Let's say that if x > 10, then the outcome is always a one, while for x < 10, there can be a mix of zeroes and ones. What happens then is that the greater the coefficient for x, the closer you can get the predicted probability to 1 for the cases with x > 10. Since their contribution to the likelihod is $\ln \hat p_i$, maximum likelihood keeps pushing that number up to the extent possible (while maintaining the other coefficients in bay so that the probabilities for x<10 are OK), and sky is the limit... except that the finite precision of computer arithmetic prevents that from technically happening, so you will stop somewhere around $\hat p_i = 1-10^{-8}$ or so. This is a known problem for glm in R; Stata diagnoses this and drops the perfectly predicted observations.

You need to identify which of your explanatory variables perfectly predicts the outcome, and do something about it -- exclude it from regression, find another measure of the underlying concept, etc. Another solution is to use Firth logistic regression, which is a frequentist version of Bayesian regression with Jeffrey's prior, or, in a distant way, a version of ridge regression for binary outcome.

Related Question