Solved – Survival analysis with rare events: Is it legitimate to use a fixed continuity correction for hazard ratio calculation in a single study

survival

We have a survival dataset in which the control group has zero events, while the other group does have some few events. Since hazard ratio calculation would yield division-by-zero errors, it appears to be common to correct for this issue by adding a fixed number (usually 0.5) to all cells. However, my research on this correction has turned up almost exclusively meta-analysis studies [1-3].

It is not immediately clear to me if this correction can also be legitimately applied to our single study. It seems counterintuitive to report a hazard ratio at all, given that the control group ultimately has zero hazard at all times of our dataset, meaning that the other group would have an "infinitely higher" risk (which, of course, does not reflect reality either).

Similar problems have been discussed before [4-7] but I could not find published/citeable evidence that using a fixed correction is legitimate outside of meta-analysis, or more specifically, in our single cohort.

References

[1] https://handbook-5-1.cochrane.org/chapter_16/16_9_2_studies_with_zero_cell_counts.htm

[2] https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.2528

[3] https://ebmh.bmj.com/content/21/2/72.long

[4] Cox regression when reference group had zero events

[5] Dealing with no events in one treatment group – survival analysis

[6] https://www.researchgate.net/post/How_to_calculate_OR_odd_ratio_if_one_of_groups_is_0_in_a_case-control_study

[7] https://www.researchgate.net/post/How_can_I_calculate_the_odds_ratio_CI_and_P_values_when_I_have_a_null_value


[EDIT] Edited to correct false display of references. [/EDIT]

Best Answer

GraphPad Prism uses what its software guide calls the "Mantel-Haenszel" approach as one of its ways to estimate a hazard ratio (HR). That is based on the difference between the number of events observed in one of the groups and the number of events that would have been expected in that group if there were no difference in survival between the 2 groups. It essentially works with an HR estimate between the survival curve of the group with events and a weighted average survival curve including both groups. That provides estimates of the HR and of associated confidence intervals (CI), even if there are no events in one of the 2 groups.

The other method used by GraphPad Prism for HR estimates is what they call the "logrank" approach.* This is based on the ratios of observed events in the 2 groups and thus will give either 0 or infinity for the HR in your case, consistent with your expectation. As its software guide says: "the [HR] results can differ when several subjects die at the same time or when the hazard ratio is far from 1.0."

The advantage of using the "Mantel-Haenszel" HR estimate in this type of situation is that it at least gives some idea of possible error in the hazard ratio and thus of an effect size. One disadvantage is that you have no way to test the underlying proportional hazard assumption that makes the HR interpretable. Another possible disadvantage is that I'm not sure how valid the assumptions underlying the calculation of HR and CI would be in this situation, even if there were a true proportional hazard that would be seen with more data or later time points.

Provided that authors appropriately cite the method used (something like: "Mantel-Haenszel estimates from GraphPad Prism, version 7"), then there is nothing necessarily wrong with presenting those HR and CI values. You are certainly correct, however, that interpretation of those values can be open to question.

What you found about adding 0.5 to cells with 0 counts as a "continuity correction" isn't what GraphPad Prism is doing here. That is done in meta-analyses to allow pooling of information from contingency tables across multiple studies. Adding to the confusion in nomenclature, the associated test for such meta-analysis is often called the "Cochran-Mantel-Haenszel" test.


*They also note that what they call the "Mantel-Haenszel" approach is called instead the "logrank" approach in the reference they cite but don't seem to link to. Terminology here can be quite confusing, differing among sources.

Related Question