Solved – projecting survival curves estimates into the future

kaplan-meierrsurvival

Lets suppose I have a have a survival curve from 0 to 6000 days using Kaplan -Meier curves. How would I be able to project future survival rates from 6001 and forward ? Is there a function or extrapolation method I can use ?

Below is an example, this is for illustration only:

library(survival)
library(ISwR)
mfit <- survfit(Surv(days, status == 1)~1, data = melanom)

How to project the curves beyond on what is observed below ?

enter image description here

EDIT:

Based on the great response from @CliffAB, I would like to add-on to the question above:

What if we assume its a parametric model (vs. non parametric KM curves) and a distribution, for instance for the same data above, I assome a log normal distribution and run the data, can I use a survival function of the assumed distrubution to project the data ?

require(flexsurv)
parm.curves  <- flexsurvreg(Surv(days, status == 1)~1,dist='lnorm',data=melanom)
plot(parm.curves)

enter image description here

The data that I'm working on is more on cutomer retention and it does not behave like the above data. Its just for an illustrative purpose only. But just shows it is difficult to project these type of problems.
My question is, can we use assumed distribution survival function to project future survival rates ?

Thanks

Best Answer

As far as I am aware, there is no way to extrapolate beyond that point with standard R-software.

And with good reason too: the Kaplan Meier curves do not make assumptions about the parametric distribution of the data. Because of this, they are complete indifferent to the assignment of probability mass beyond the last observed event.

I'm glossing over some details here, but suppose in your dataset, only 30% of subjects are observed to have had events. You would be hard pressed to estimate the 90% percentile without making very strong assumptions about the parametric family the data was generated from. So if you really want to make estimations beyond t = 6,000, you will probably need to switch to a parametric estimator (also, you should be very skeptical about those estimates!!)