survival – How to Forecast Future Events Using Survival Analysis

cox-modelforecastingrsurvivaltime series

I am studying survival analysis and am trying to see if there's a way to probabilistically forecast future outcomes, using simulation or other means.

In the first example below, I fit a Cox model to the complete "lung" data from the survival package, showing 1000 months of outcomes. In the second example, I adjust the "lung" data as-if I only had 500 months of survival data, creating object "lung1".

Using survival analysis, how could I probabilistically forecast events for months 501-1000 for lung1, assuming I only had data for months 1-500? I've used time-series forecasting models (ETS, ARIMA, etc.) but I wonder if there's a better solution using survival analysis? A problem with these time-series models is generating negative survival outcomes which obviously is impossible. Nevertheless, I post an image below of an ETS forecast model I've used before with log adjustments to eliminate negative-value outcomes.

I post simple code for the Cox survival models at the bottom. Images for "lung" and truncated "lung1" data:

enter image description here

Example of ETS time-series model forecast (using other data):

enter image description here

Code:

# Example from http://www.sthda.com/english/wiki/cox-proportional-hazards-model

library(survival)
library(survminer)

# status 1 = censored
# status 2 = dead

### Full data set ###
# Cox regression of time to death on the time-constant covariates 
cox <- coxph(Surv(time, status) ~ age + sex + ph.ecog, data =  lung)

# Plot the baseline survival function
ggsurvplot(survfit(cox, data = lung), palette = "#2E9FDF", ggtheme = theme_minimal())

### Truncate the full data set "as if" we only had the first half of the time series available
# lung1 reduces study time to 500 months (from 1000) and adjusts status (via status1) at month 500 cut-off
lung1 <- lung %>%
    mutate(time1 = pmin(time,500)) %>%
    mutate(status1 = if_else(time > time1,as.integer(1),as.integer(status)))

# Cox regression of time to death on the time-constant covariates 
cox1 <- coxph(Surv(time1, status1) ~ age + sex + ph.ecog, data =  lung1)

# Plot the truncated survival data
myplot <- ggsurvplot(survfit(cox1, data = lung1), palette = "#2E9FDF", ggtheme = theme_minimal(),xlim = c(0, 1000))
myplot$plot <- myplot$plot + 
  scale_x_continuous(breaks = sort(c(seq(0, 1000, 250))))
myplot

Best Answer

You can't do that with a Cox model, as it provides no survival information beyond the last observed event time.

You can try to do that with a parametric survival model, for example one of those provided by the survreg() function in R.

For your plot based on the survfit() function applied to a Cox model, "Default is the mean of the covariates used in the coxph fit," according to the survfit.coxph help page. In your case, one might ask whether a mean value of sex is a useful concept.

There is no such default for a survreg object. You specify particular covariate values in a data frame to the predict() function. The last example on the predict.survreg help page shows how to generate a survival plot with standard errors for a simpler model also based on the lung data set.

Related Question