Solved – How to use survival model to predict when there are time dependent continuous covariates

cox-modelpredictionrregressionsurvival

In my practical projects, I usually track a group of people for a few months, then I run a model based on the first few months behavior. Hopefully I can predict both overall and individual level's event rate in future time.

Here I will use PBC data example to illustrate my question.

library(pbc)
temp <- subset(pbc, id <= 312, select=c(id:sex, stage)) # baseline 
pbc2 <- tmerge(temp, temp, id=id, death = event(time, status)) #set range 
pbc2 <- tmerge(pbc2, pbcseq, id=id, ascites = tdc(day, ascites), 
           bili = tdc(day, bili), albumin = tdc(day, albumin), protime = tdc(day, protime), alk.phos = tdc(day, alk.phos)) 

I separate date into training set and test set. I use train data to build model then use test set for prediction.

pbc.train=pbc2[(1:995),]
pbc.test=pbc2[(996:1807),]
fit2 <- coxph(Surv(tstart, tstop, death==2) ~ log(bili) + log(protime)+age+trt, pbc.train)

Here are my questions related to prediction:

  1. If I want to see the overall survival rate at each time point already occurred, I can use

    summary(survfit(fit2), time=300)
    

But if I want to predict the overall survival rate at a time point not happened yet, for example at time=10000. How can I do it? summary(survfit(fit2),time=10000) will not do.

  1. How to predict each individual's survival rate in future? I can use the predict call to compute relative hazard ratio to average for a set of covariates as follows.

    covs=data.frame(bili=1.9, protime=12, age=40, trt=1) 
    predict(fit2, newdata=covs, type="risk")
    

    But the real test data is organized as one person has multiple records. More importantly, for each individual, time dependent covariates changes at each time interval. Therefore, if I used something like

      predict(fit2, newdata=pbc.test,type="risk")
    

It does not make sense. I want to know each person's risk of death in future or at each future time points. How can I do that?

another side question, in the test data, some people are already died. Is that appropriate that I delete the ids who already have event happened? Only use predict model to predict the future survival rate for patients survived from this study.

Best Answer

I start with answering question 2.

In the pec package there is a function called predictSurvProb which will do exactly what you want. This function requires a coxph object (your fit2), a new data frame (your pbc.test) and lastly the point(s) in time of which you want the survival probability. Note that i said point(S), you can also give c(1,2,3,4) as argument to get the survival probability at time points 1, 2, 3 and 4. It gives the survival probability for each individual at the given time points.

I think question 1. has the same answer as question 2. But use the training set (pbc.train) as your new data frame. So do predictSurvProb(fit2, pbc.train, 10000) and this will deliver the desired results. I am not entirely sure if this function can handle points in time not yet happened.

Your side question. If the persons are already dead, they do not provide any added information (not sure), so I think it is safe to delete them. However, you can also run the code with and without them and check the results. If they differ, I am inclined to think they still provide some information and then it would be inappropriate to delete them.

Related Question