Solved – confirm or validate underlining distribution association with survival analysis

distributionsepidemiologyrsurvivalweibull distribution

The following two question outline how one can plot the results from a survival analysis using R. Q1 and Q2

But both of the examples assume, or more directly specify a weibull distribution fitted to the survival model.

Refering to survreg() which is within the survival package in R, the following are possible fitting assumptions:

  • weibull
  • exponential
  • gaussian
  • logistic
  • lognormal
  • loglogistic

So if we take the example from the survreg() help.

library(survival)

data(ovarian)

head(ovarian)

survival.weibull <- survreg(Surv(time, status) ~ ph.ecog + age + strata(sex), 
                          dist='weibull', lung)

survival.logn <- survreg(Surv(time, status) ~ ph.ecog + age + strata(sex), 
                            dist='lognormal', lung)

survival.logl <- survreg(Surv(time, status) ~ ph.ecog + age + strata(sex), 
                         dist='loglogistic', lung)

How can one verify the best appropriate distribution to fit. From the summary statistic we get the Loglik(model) value. Is this the best indicator? Or is there a graphical method to visualise the best fit – I was suggested a QQ-plot may be of help.

Thanks in advance for an advice

Best Answer

The use of log-likelihoods are problematic because not all the survival models are nested within one another. But you can use something like AIC or BIC, which can be obtained using extractAIC(). For example:

> extractAIC(survival.weibull)
[1]    5.000 2284.504
> extractAIC(survival.logn)
[1]    5.000 2315.142
> extractAIC(survival.logl) 
[1]    5.000 2297.171

This would suggest, based on AIC, that the Weibull model fits best. In terms of visualizing the best fit, you could plot all three parametric fits atop a non-parameteric Kaplan-Meier curve to visualize the fit of the parametric forms to the underlying data. The means to do that is in your first question.

Related Question