Solved – Extremely huge Hazard Ratios from Cox regression


We made a cox regression and ended up with huge HR for some of our variables. One of them (which was an interaction) gave us a HR=2747,093. Our dataset consists of only 74 observations. What is wrong?

Best Answer

Most likely you have a situation where, for a particular level of a categorical variable, >0 people have an event, but for the other levels, no one does. That makes it so that the HR is infinite--the extremely large HR you find is the computer's attempt to tell you the MLE is infinity. In logistic regression, this is called "complete separation"; I assume this term still applies for survival analysis but, regardless of the name, you get the idea.

The root problem is when there is complete separation is that your model is overfitting the data. Evidently, for some level of your categorical predictor, there's a very low chance of an event, so you'd need a much larger sample size to be able to estimate the HR. It's analogous to a contingency table with some very small expected counts.

Related Question