Survival Analysis Forecasting – Predicting Future Survival Probabilities

forecastingrsurvivaltime series

I've used time-series forecasting (ETS, ARIMA, etc.) models but because of the depth of my data I'm exploring using the survival and related packages in R to forecast future period survival probabilities. For example, if we have 24 months of survival data, I'm interested in forecasting months 25-36 using the statistical parameters derived from analyzing months 1-24, using either Kaplan-Meier or a parametric distribution like Weibull. In the form of simulation with multiple simulation paths derived, as I've done with traditional time-series methods. I'm exploring alternatives to the traditional time-series methods because they seem overly-simplistic in light of depth of data I have (similar to what you see in the lung and kidney datasets that are part of the survival package). Please, does anyone know if this type of probabilistic future period forecasting is possible in survival analysis, and know of any simple examples on-line that I can start with?

Best Answer

One obvious solution, as you already alluded to is to use a parametric model (such as Weibull regression). Non-parametric/semi-parametric models like Kaplan-Meier or Cox do in their standard implementation not extrapolate beyond the largest observed event time. There's other implementations that more or less do the same thing, but can extrapolate, e.g. an exponential model with piecewise constant hazard rates can be very similar to Cox regression, but will extrapolate if you just assume the hazard rate of the last interval going forward (similarly, e.g. the brms R package implements a kind of Cox model using M-splines for the baseline hazard function which I think should permit extrapolation).

E.g. if you use R, one obvious option is to penalize Weibull regression fit <- survreg(Surv(time, has_event) ~ 1 + ridge( predictor1, predictor2, ... , theta=1, scale=T), dist = "weibull", data=mydata), where the ridge-penalty theta could be picked using cross-validation. After fitting, the traditional shape parameter is given by 1/fit$scale and the scale parameter for a record predict(object=fit, newdata=newdata, type="lp"), so that you get an estimated survival curve by plotting the Weibull CDF with these parameters, but that ignores uncertainty about the parameters. This gets a lot easier to deal with in the Bayesian version of Weibull regression, which is nicely supported in the brms R package (if you want instead of individual patient time predictions to predict the parameters of the distribution, there's some nuances: see here).

Alternatively, a more machine learning oriented approach would be to e.g. use XGBoost with an accelerated failure time loss, but calibration tends to be poor. However, there's alternatives that try to fix this.

Related Solutions

survival – How to Forecast Future Events Using Survival Analysis

You can't do that with a Cox model, as it provides no survival information beyond the last observed event time.

You can try to do that with a parametric survival model, for example one of those provided by the survreg() function in R.

For your plot based on the survfit() function applied to a Cox model, "Default is the mean of the covariates used in the coxph fit," according to the survfit.coxph help page. In your case, one might ask whether a mean value of sex is a useful concept.

There is no such default for a survreg object. You specify particular covariate values in a data frame to the predict() function. The last example on the predict.survreg help page shows how to generate a survival plot with standard errors for a simpler model also based on the lung data set.

Simulation – How to Generate Simulation Paths for Weibull Distribution in R?

Your simulated_data are just 20 draws from a single fixed Weibull distribution having the specified shape and scale values. They are just samples of individual survival times drawn from that single distribution. This shows the associated survival function:

 plot(1:70,1-pweibull(1:70,shape=1.5, scale=1/0.03),type="l",bty="n",xlab="Time",ylab="Survival probability")

If the Weibull distribution is fixed, then the survival function is also fixed and there would be no variability in survival curves.

If you want to evaluate uncertainty in survival-curve estimates based on modeled data, then you want to draw from distributions of the shape and scale values consistent with the variance-covariance matrix of their modeled estimates, for example with the mvrnorm() function of the R MASS package, and display the set of survival curves in the way indicated above.

Best Answer

Related Solutions

survival – How to Forecast Future Events Using Survival Analysis

Simulation – How to Generate Simulation Paths for Weibull Distribution in R?

Related Question