Solved – How to model survival analysis when proportional hazards assumption is not met and stratification and time-varying are not possible

cox-modelproportional-hazardssplinessurvivalweibull distribution

I am modelling a survival analysis over a rather long follow-up period (10 years). My exposure is time-invariant and clearly violates the proportional hazards assumptions so Cox Proportional Hazards regression models are not an option. I was wondering about alternatives to conduct my analyses. Please find below some key points:

Stratification is not possible because it is my main variable of interest that violates the assumption and I need to compare between groups
Time-varying models are not possible given the nature of my main variable of interest
I initially thought about time-partitioned model (splitting follow-up time and interacting time with my main variable of interest) but I am not sure that is a good idea because when I plot the KM curves it is all crossing – so I struggle to find a good time interval for splitting

Given these considerations I have thought about employing flexible parametric models. Howevers, from my understanding they make strong assumptions about the shape of the curve – which is something I cannot be certain of. Would a flexible parametric model with restricted cubic spline what I am looking for? But How can I define the number of knots? And how about the distribution?

Could you please provide some inputs and examples? What do you suggest?

I use Stata MP 15 as statistical software

Best Answer

I disagree that AFT and PO are necessarily the right next steps. It depends on what you are interested in learning from the model. If you are interested in estimating a hazard ratio, understand that, the idea that there is one hazard ratio that applies over a period of time, implies to some extent that hazards must be proportional.

On the other hand, in many applications there are much more informative summaries of survival analyses than hazard ratios, which don't inherently have a PH assumption baked in. For example, you can calculate risk differences and risk ratios at domain-relevant time-points. These are typically more intuitive and easier to interpret correctly than hazard ratios. RDs and RRs are still available using stratified Cox models (assuming your exposure is categorical). For an overview of these ideas, you can have a look at this reference: https://pubmed.ncbi.nlm.nih.gov/25660080/

Now, if you insist on summarizing your data using hazard ratios, and hazards are not proportional, you can examine how the hazard ratio is changing over time using interactions between time and your time-invariant covariate of interest. This is a valid use of Cox models under non-proportional hazards and can be quite informative - explained in this paper: https://pubmed.ncbi.nlm.nih.gov/12915864/. On the other hand, if the violation of proportionality is not too extreme, a single hazard ratio can still be a reasonable summary of the data - explain in this paper: https://pubmed.ncbi.nlm.nih.gov/32167523/.

Related Solutions

Solved – Comparison of CPH, accelerated failure time model or neural networks for survival analysis

It depends on why you are making models. Two main reasons to construct survival models are (1) to make predictions or (2) to model effect sizes of covariates.

If you want to use them in a predictive setting in which you want to obtain an expected survival time given a set of covariates, neural networks are likely the best choice because they are universal approximators and make less assumptions than the usual (semi-)parametric models. Another option which is less popular but not less powerful is support vector machines.

If you are modelling to quantify effect sizes, neural networks won't be of much use. Both Cox proportional hazards and accelerated failure time models can be used for this goal. Cox PH models are by far the most widely used in clinical settings, in which the hazard ratio gives a measure of effect size for each covariate/interaction. In engineering settings, however, accelerated failure time (AFT) models are the weapon of choice.

Cox Model – How to Compute Partial Log-Likelihood Function

This is technically a programming question with an easy programming answer. If you simply want the partial likelihood, why not fool R into giving it to you? Simply initialize beta and allow no iterations, then extract the loglik value from the coxph object. (see ?coxph.object).

For example:

## artificial data
library(survival)
n <- 1000
t <- rexp(100)
c <- rbinom(100, 1, .2) ## censoring indicator (independent process)
x <- rbinom(100, 1, exp(-t)) ## some arbitrary relationship btn x and t
betamax <- coxph(Surv(t, c) ~ x)
beta1 <- coxph(Surv(t, c) ~ x, init = c(1), control=list('iter.max'=0))

With example output:

> betamax$loglik
[1] -68.62548 -65.99652
> beta1$loglik
[1] -66.10908 -66.10908

You can even define a wrapper:

loglik <- function(beta, formula) {
  formula, init=beta, control=list('iter.max'=0))$loglik[2]
}

betas <- seq(0, 2, by=0.01)
logliks <- sapply(betas, loglik, Surv(t, c) ~ x)
plot(betas, logliks)
abline(v=betamax$coefficients)

Best Answer

Related Solutions

Solved – Comparison of CPH, accelerated failure time model or neural networks for survival analysis

Cox Model – How to Compute Partial Log-Likelihood Function

Related Question