Solved – Simulation in R of data based on Cox proportional-hazards model for power analysis

cox-modelrstatistical-powersurvival

I would like to fit data based on Cox proportional-hazards model and then simulate new data based on a fitted model. For instance, provided that the fit is

# Create the simplest test data set 
test1 <- list(time=c(4,3,1,1,2,2,3), 
               status=c(1,1,1,0,1,1,0), 
               x=c(0,2,1,1,1,0,0)) 
# Fit a survival model 
mod_cox <- coxph(Surv(time, status) ~ x,data=test1)

how would I made simulations based on the fitted mod_cox model using x as covariates?

Additionally, I would also like to make additional simulations assuming a Cox model with a hazard rate equal to a constant (say $K$) times the hazard rate of the mod_cox fitted Cox model.

I am interested in simulating censoring times because I want to perform a power analysis and I need censoring times to replicate survival analyses several times.

Best Answer

For power analysis of a Cox model you do not need to do simulations of event or censoring times.

If the covariate x of interest were binary then there would be several ways to proceed, as typical power analysis programs for Cox models are for treatment/control comparisons that can be reasonably extended to other binary covariates. But your x seems to be multi-valued, perhaps continuous, and by the structure of your model it is assumed to be linearly related to log hazard.

For planning a future observational study based on your current data set with a non-binary covariate, you need to take into account both the hazard change per unit change in the covariate value and the distribution of the values of that covariate among your population of interest. If there are additional covariates in your model, the association of your covariate of interest with those other covariates must also be considered.

The R powerSurvEpi package provides tools to handle this type of situation. For study planning its ssizeEpiCont function is designed to work with a pilot study from which it will estimate the variance of the covariate of interest, its multiple correlation with other covariates, and the fraction of cases that had an event. Specify the significance level, hypothesize the hazard ratio (for example, based on what you found in mod_cox), and the function will calculate the number of cases needed. If you want to calculate power instead, the package's powerEpiCont function provides similar handling for calculating power based on a given a number of cases. The formulas used are shown clearly in the manual pages for those functions.

These calculations are based on a paper by Hsieh and Lavori, who reported:

Simulations show that the censored observations do not contribute to the power of the test in the proportional hazards model [for a continuous covariate], a fact that is well known for a binary covariate.

That's a critically important point about survival analysis: it's the number of events that provides the power.

You thus do not have to be concerned further about the timings or numbers of the censored cases. The values passed to the functions noted above don't even require the event times from the pilot study, just which cases had events. The total number of cases needed for a specified power is then related to number of events that is needed and the fraction of cases that had events. More complicated simulations are not required.

Related Question