Survival Analysis – Comparing Median Survival-Times to Cox Regression When Proportional Hazard Assumption Fails

cox-modelsurvival

I've used Cox regression to assess the risk/association of exposure with an event. My Cox analysis suggests that subjects exposed have an increased risk of the event happening compared to subjects not exposed. Finding that the exposure violated the proportional hazards assumption meant I implemented a time-dependent exposure covariate. Even after satisfying the proportional hazards assumption, HR didn't change much. The analysis continued to suggest subjects exposed have an increased risk of the event.

A colleague took my data and found the median survival time for those not exposed was shorter (only by a small amount) than those exposed. On the surface, this appears opposite to the Cox results. I think a comparison of median survival times was incorrect. Firstly, a median over the raw data does not consider subjects who died, left the study, or were lost to follow-up. These factors can be handled using a Kaplan-Meier curve, treating them as right-censored. However, a KM curve assumes the covariate effect is constant over time. Therefore, KM curves aren't helpful when dealing with time-dependent data.

I'm dealing with the top brass, the big wigs. Presenting Cox data is often met with reluctant acceptance (it's not intuitive at a moment's glance). My colleague hoped to find a duration-based (e.g., median time to event) approach that was more "illustrative" than Cox regression. I've explained that a median or even a KM curve would be inappropriate due to the time dependency in the data.

Are there any techniques I could use to graphically illustrate (like a KM curve) the survival times whilst matching well with the Cox results?

Best Answer

If your 2 groups are clearly differentiated by exposure versus no-exposure at the start of the study, individuals don't change exposure group thereafter, and there are no other covariates in your model, then a Kaplan-Meier analysis is OK. It doesn't assume that the effect of group membership is constant over time, just that the groups have distinguishing characteristics and that group membership doesn't change over time. You can readily get crossing Kaplan-Meier survival curves for 2 groups, with one group having better survival at early times and worse survival at later times. Of course, in that situation there aren't proportional hazards (PH) for group membership.

Comparing median survival times can be tricky. This answer outlines the problems, with a link to further reading. For example, with crossing survival curves you could get the same median survival despite major overall differences in survival patterns between the groups.

If your model involves additional covariates, the raw Kaplan-Meier curves might be misleading. You then are better served by illustrating model predictions for survival over time between the two groups, for covariate values representative of the population of interest.

I think part of your problem is in the way that you are thinking about the "time-dependent exposure covariate" that you invoked to deal with the violation of proportional hazards. That's typically done in the service of estimating a time-varying regression coefficient for the underlying exposure covariate (see Section 4.2 of the R time dependence vignette). You will get into trouble if you think of that as a time-dependent variable in any real sense; it's a construct to let you apply the simple Cox event-time-by-event-time analysis when the association of the underlying covariate with outcome changes over time. (I'm assuming that your subjects either were exposed or weren't exposed at the start of the study; if exposure changes over time then you do have to model exposure itself as a time-varying covariate.)

Also, if you have a very large number of events, it's very easy to get a "statistically significant" violation of PH that doesn't matter in practice. If PH is violated, the Cox regression coefficient provides a type of event-averaged hazard ratio that can still be informative.

My suggestions:

If the exposure groups are clearly distinguished at time = 0, their membership doesn't change thereafter, and there aren't covariates in your model, show the Kaplan-Meier curves (which take censoring into account). To demonstrate "statistical significance" of the difference in survival, cite a "log-rank" test, which in this case is just the "score" test reported for the original Cox model (whether PH holds or not).

If there are covariates in your Cox model, show differences in model predictions of survival over time by exposure group at otherwise the same representative covariate values. Explain that as "controlling for" those covariates to highlight the specific exposure-associated differences. The fairest representation would be from the model using the time-varying coefficient. Express "statistical significance" in terms of that of the coefficient(s) associated with exposure.

Related Solutions

Solved – How to interpret a Cox hazard model survival curve

Since the hazard depends on the covariates, so does the survival function. The model assumes that the hazard function of an individual with covariate vector $x$ is $$ h(t;x) = h_0(t) e^{\beta'x}. $$ Hence, the cumulative hazard of this individual is $$ H(t;x) = \int_0^t h(u;x) du=\int_0^t h_0(u) e^{\beta'x} du = H_0(t)e^{\beta'x}, $$ where we may define $H_0(t)=\int_0^t h_0(u) du$ as the baseline cumulative hazard. The survival function for an individual with covariate vector $x$ is in turn $$ S(t;x) = e^{-H(t;x)}=e^{-H_0 e^{\beta'x}}=S_0(t)^{e^{\beta'x}} $$ where we define $S_0(t) = e^{-H_0(t)}$ as the baseline survival function.

Given estimates $\hat\beta$ and $\hat S_0(t)$ of the regression coefficients and the baseline survival function, an estimate the survival function for an individual with covariate vector $x$ is given by $\hat S(t;x)=\hat S_0(t)^{e^{\hat\beta'x}}$.

To compute this in R you specify the value of your covariates in the newdata argument. For example if you want the survival function for individuals of age=70, do

plot(survfit(fit, newdata=data.frame(age=70)))

If you, as you do, omit the newdata argument, its default value equals the average values of the covariates in the sample (see ?survfit.coxph). So what is shown in your graph is an estimate of $S_0(t)^{e^{\beta'\bar x}}$.

Solved – How to model survival analysis when proportional hazards assumption is not met and stratification and time-varying are not possible

I disagree that AFT and PO are necessarily the right next steps. It depends on what you are interested in learning from the model. If you are interested in estimating a hazard ratio, understand that, the idea that there is one hazard ratio that applies over a period of time, implies to some extent that hazards must be proportional.

On the other hand, in many applications there are much more informative summaries of survival analyses than hazard ratios, which don't inherently have a PH assumption baked in. For example, you can calculate risk differences and risk ratios at domain-relevant time-points. These are typically more intuitive and easier to interpret correctly than hazard ratios. RDs and RRs are still available using stratified Cox models (assuming your exposure is categorical). For an overview of these ideas, you can have a look at this reference: https://pubmed.ncbi.nlm.nih.gov/25660080/

Now, if you insist on summarizing your data using hazard ratios, and hazards are not proportional, you can examine how the hazard ratio is changing over time using interactions between time and your time-invariant covariate of interest. This is a valid use of Cox models under non-proportional hazards and can be quite informative - explained in this paper: https://pubmed.ncbi.nlm.nih.gov/12915864/. On the other hand, if the violation of proportionality is not too extreme, a single hazard ratio can still be a reasonable summary of the data - explain in this paper: https://pubmed.ncbi.nlm.nih.gov/32167523/.

Best Answer

Related Solutions

Solved – How to interpret a Cox hazard model survival curve

Solved – How to model survival analysis when proportional hazards assumption is not met and stratification and time-varying are not possible

Related Question