Proportional Hazards – How to Handle Proportional Hazards Violations in Log-Rank and Related Tests

assumptionshazardlogrank-testsurvival

I have (censored) time-to event data for subjects in four groups. I would like to do something like a logrank test, but the survival curves do not satisfy the proportional hazards assumption. I think I have heard that the consequence of a prop. hazards violation is loss of power to identify differences between survival curves, however, in the case of my study, the logrank tests do show significant differences between the groups.

The proportional hazards violation takes the following form: one group has a relatively larger probability of an event early in the observation period, and another group has a relatively larger probability of an event late in the observation period. I believe that the G-$\rho$ family of tests (for example in the survdiff function in R's survival package) can be parameterized such that the earlier or later portion of the observation period is more heavily weighted. However, in this case, the different groups would "do better" (the event being studied is a good thing) in different specifications for the test.

I would like to know several things:

  1. if the logrank function does find significant difference despite the presence of a proportional hazards violation, can we interpret this as indicating a true (overall or average) difference between the curves? or does this violation mean the test's results are totally meaningless?
  2. is there a principled way of describing survival times in cases like mine? I would ideally like to be able to report an overall hazard ratio (I know — this would lack external validity with the non-proportional hazards and censored observations, but would be useful in describing the experiment), as well as give information about which groups were more likely to have events at which times. I could choose a break point in the middle of the observation period and just do separate tests before and after (assuming these subsets of the data did satisfy the proportional hazards assumption), but the choice of such a point feels somewhat ad hoc.

Related discussion:

This thread discusses alternatives to the logrank test, but doesn't consider my issue:
What are the pros and cons of using the logrank vs. the Mantel-Haenszel method for computing the Hazard Ratio in survival analysis?

Best Answer

The log-rank test is valid whatever the true situation with the hazards is. You are correct that only its power is affected. So if it rejects, then the hazards are not equal. If it does not reject, then you have to worry about the proportionality of hazards and power.

The principled approach would be trying to estimate the difference/ratio of the two hazards in a time-dependent matter. This is not simple, but doable. I would recommend the book by Martinussen and Schalke: Dynamic Regression Models for Survival Data, and the corresponding R package timereg. The support of a knowledgeable statistician would probably also be needed. Note that this is beyond standard survival analysis fare, so not everybody would know these techniques.

A last note: if the hazards are not proportional, then you just cannot have one value for the hazard ratio.