Solved – Conditional expected lifetime in survival analysis

predictionrsurvival

I want to do survival analysis with the Colon Cancer data in survival package:

library(survival)
Colon<-subset(colon,etype==2 & rx!='Obs')
fit<-survreg(Surv(time,status)~rx+age+sex,data=Colon, dist="weibull")
Now how can I predict conditional expected lifetime given time, rx, age and sex?

For example, if a 50 years old, male patient under Levamisole treatment has already survived for 2000 days, what would be his expected life time?

I've tried several parameters in predict function, none of them seem to give what I want directly, and I don't know which to choose so I can make some transformation.

Best Answer

One possible approach would be to predict the median survival time (i.e. the expected time at which the survival for a patient with a specific combination of characteristics would be 0.5, in other words, from that timepoint onwards it is more likely that the patient has died than that he is still alive). This can be done using functions from the rms package as follows:

library(rms)

# datadist is needed to fit the models:
dd <- datadist(colon)
options(datadist="dd")

fit <- psm(Surv(time,status)~rx+age+sex,data=colon, dist="weibull", x=TRUE, y=TRUE)

# The Quantile-function, creates a new function using your model for getting 
# quantiles of the predicted survival (by default median)
QFUN <- Quantile(fit)

# Now use QFUN to get the predicted median survival for the patients in your data
pred.med.surv <- QFUN(predict(fit,type="lp"))

# When asking for only one quantile (here 0.5), it drops dimensions of the output, so 
# you would have to use the following to bind the new variable to your data.frame 
colon$pred.med.surv <- as.numeric(dimnames(pred.med.surv)[[2]])

# For multiple quantiles of predicted survival you can use the following:
preds <- QFUN(q=c(0.25,0.5), predict(fit,type="lp"))
colon$surv0.25 <- preds[,"0.25"]
colon$surv0.5 <- preds[,"0.50"]

A few notes:

I see that you delete patients who died 'etype=2' from the dataset. This could lead to bias. In this case you want to evaluate recurrence, death is actually a competing risk for recurrence which calls for different type of survival analysis. Then it would be advisable to call in expert help.
The predicted survival time depends on the patients in the dataset you used to develop the model. If e.g. the model was build using measurements of patients at the day of their diagnosis (day 0), you should also enter the predictor values of the patients at day 0 and not those at day 2000. Since the example patient you mention has already survived up to day 2000, the model would not be valid for him, as his survival is probably longer. Or you could for example include a predictor variable indicating the time since diagnosis.
As I don't know the details of your research project, I would advise you to consult a statistician to determine a valid approach for your research project.

Hope this helps.

Related Solutions

Solved – Modelling mortality rates using Poisson regression

Without seeing the dataset (not available) it seems mostly correct. The nice thing about Poisson regressions is that they can provide rates when used as suggested. One thing that may be worth to keep in mind is that there may be overdispersion where you should switch to a negative binomial regression (see the MASS package).

The Poisson regression doesn't care whether the data as aggregated or not, but in practice non-aggregated data is frail and can cause some unexpected errors. Note that you cannot have surv == 0 for any of the cases. When I've tested the estimates are the same:

set.seed(1)
n <- 1500
data <- 
  data.frame(
    dead = sample(0:1, n, replace = TRUE, prob = c(.9, .1)),
    surv = ceiling(exp(runif(100))*365),
    gender = sample(c("Male", "Female"), n, replace = TRUE),
    diagnosis = sample(0:1, n, replace = TRUE),
    age = sample(60:80, n, replace = TRUE),
    inclusion_year = sample(1998:2011, n, replace = TRUE)
  )

library(dplyr)
model <- 
  data %>% 
  group_by(gender, 
           diagnosis,
           age,
           inclusion_year) %>% 
  summarise(Deaths = sum(dead),
            Person_time = sum(surv)) %>%
  glm(Deaths ~ gender + diagnosis + I(age - 70) + I(inclusion_year - 1998) + offset(log(Person_time/10^3/365.25)), 
      data = . , family = poisson)

alt_model <- glm(dead ~ gender + diagnosis + I(age - 70) + I(inclusion_year - 1998) + offset(log(surv/10^3/365.25)), 
    data = data , family = poisson)
sum(coef(alt_model) - coef(model))
# > 1.779132e-14
sum(abs(confint(alt_model) - confint(model)))
# > 6.013114e-11

As you get a rate it is important to center the variables so that the intercept is interpretable, e.g.:

> exp(coef(model)["(Intercept)"])
(Intercept) 
    51.3771

Can be interpreted as the base rate and then the covariates are rate ratios. If we want the base rate after 10 years:

> exp(coef(model)["(Intercept)"] + coef(model)["I(inclusion_year - 1998)"]*10)
(Intercept) 
     47.427

I've currently modeled the inclusion year as a trend variable but you should probably check for nonlinearities and sometimes it is useful to do a categorization of the time points. I used this approach in this article:

D. Gordon, P. Gillgren, S. Eloranta, H. Olsson, M. Gordon, J. Hansson, and K. E. Smedby, “Time trends in incidence of cutaneous melanoma by detailed anatomical location and patterns of ultraviolet radiation exposure: a retrospective population-based study,” Melanoma Res., vol. 25, no. 4, pp. 348–356, Aug. 2015.

Solved – compare non-nested Cox models

You can compare Cox regression models (coxph) in R with plrtest which is partial likelihood ratio test for non-nested coxph models:

require("survival")
require("nonnestcox") #github.com/thomashielscher/nonnestcox
pbc  <- subset(pbc, !is.na(trt))
mod1 <- coxph(Surv(time, status==2) ~ age, data=pbc, x=T)
mod2 <- coxph(Surv(time, status==2) ~ age + albumin + bili + edema + protime, data=pbc,  x=T)
mod3 <- coxph(Surv(time, status==2) ~ age + log(albumin) + log(bili) + edema + 
log(protime), data=pbc, x=T)
plrtest(mod3, mod2, nested=F) # non-nested models
plrtest(mod3, mod1, nested=T) # nested models

Best Answer

Related Solutions

Solved – Modelling mortality rates using Poisson regression

Solved – compare non-nested Cox models

Related Question