Survival Analysis – How Parameter Estimates in Survreg Relate to Survival Function in Log-Logistic and Other Distributions

rsurvival

Hello I am learning about survival analysis and introduced to parametric models with survreg from the survival package in R. In this process I was introduced to the idea of different ways to parametrize distributions. I found this topic very confusing so I am trying to wrap my head around it though examples.

Exponential Distribution
set.seed(123)
size <-  1000
deathtime <- rexp(size, rate = 1)
exp.fit <- survreg(Surv(deathtime) ~ 1, dist = "exponential")
exp.fit$icoef #paramter estimates

The intercept of this model is the ln(hazard) or ln($\lambda$) so if I wanted to plot the survival function it would look like this

$ \lambda = exp(intercept)$

$S(t) = exp(-\lambda t)$

I am pretty sure this is correct but correct me if I am wrong.

Weibull Distribution

Now this where things get more confusing for me after this point.

set.seed(123)
size <-  1000
deathtime <- rweibull(size, shape = 3, scale = 2)
wei.fit <- survreg(Surv(deathtime) ~ 1, dist = "weibull")
wei.fit$icoef

The coefficients of the weibull fit are the intercept and log(scale). When I simulated the weibull data I have a shape or $\alpha$ = 3 and scale or $\beta$ = 2. If I wanted to plot the survival function I would need to do this:

$ \beta = exp(intercept)$

$ \alpha = 1/exp(log(scale)) $ I think it is weird that the scale gives the shape parameter

$ S(t) = exp(-(t/\alpha)^\beta) $ Is that correct?

Log-Logistic

This is where I have the limits of my knowledge that I need the most clarification.

loglog.fit <- survreg(Surv(deathtime) ~ 1, dist = "loglogistic")

Fitting the log logistic distribution also provides the intercept and log(scale). If I would like to plot the survival function would this be correct?

$ \beta = exp(intercept) $

$ \alpha = 1/exp(log(scale)) $

$ S(t) = [1 + (t/\alpha)^\beta]^{-1} $ Formula from wiki

Please let me know if my understanding is correct on using the parameters of the survreg function.

Best Answer

Parameterizations can be tricky, and get even more complicated when you move from the the logarithmic time scale, often used to illustrate a parametric model, to an absolute time scale. I find myself re-learning each time I try to answer a question like this.

I find these course notes to be a helpful, succinct summary. They cover several models of the general form

$$\log T = \alpha + \sigma W ,$$

where $\alpha$ is a "location" parameter, $\sigma$ is a "scale" parameter, and $W$ has a specified probability distribution. For a log-logistic model, $W$ is standard logistic, with random values given by rlogis() in R.

The trick is to find how to express the parameters of a log-logistic survival model in terms of $\alpha$ and $\sigma$ above. Those course notes show that the survival function $S(t)$ for a log-logistic model takes the form

$$ S(t) = \frac{1}{1+(\lambda t)^p},$$

where $\alpha = - \log \lambda$ and $\sigma = 1/p$. Those two relationships between parameters in the $\log T$ scale and the $T$ scale hold for many model types discussed in the course notes.

To check this and see what values survreg() returns for coefficients, generate some random log-logistic survival times, following your general approach. I did this for $\lambda = 2$ and $p = 3$, plugging into the formula for $\log T$, then exponentiating:

set.seed(123)
lldeathtimes <- exp(-log(2)+ (1/3)*rlogis(1000))
loglog.fit <- survreg(Surv(lldeathtimes) ~ 1, dist = "loglogistic")
loglog.fit$icoef
#(Intercept)  Log(scale) 
# -0.6985667  -1.1065919 

You can then see how the reported (Intercept) and Log(scale) values correspond to the parameters of the model. Note that the (Intercept) is close to $-\log(2)$, or $\alpha$ in the formula for $\log T$ when $\lambda = 2$. For Log(scale), note that the corresponding scale value is close to $1/3$, or $\sigma$ in the formula for $\log T$ when $p=3$.

If you plot the synthetic survival data and overlay the intended log-logistic survival function in the parameterization I used:

plot(survfit(Surv(lldeathtimes)~1),xlim=c(0,4))
curve(1/(1+(x*2)^3),from=0,to=4,col="blue",add=TRUE)

you will see that they agree well.

So for a model of the above general form for $\log T$, (Intercept) is the "location" parameter $\alpha$ and scale is the "scale" parameter.

What gets confusing, particularly for Weibull models, is that the rweibull() parameterization differs from the used by survreg() with even a switching of names.* From the manual page for survreg.distributions:

survreg scale parameter maps to 1/shape, linear predictor to log(scale)

So you can't count on the word "scale" to mean the same thing, even in a single software package. Combine that with the different ways to parameterize the underlying distributions (what I used from the course notes for log-logistic differs from what you got from Wikipedia) and you have a recipe for massive confusion. I'd recommend avoiding the use of words "shape" or "scale" or "location" to describe the parameters and just use symbols in equations, to be most specific about just what you mean.


*I think that you have $\alpha$ and $\beta$ interchanged in your formulas for your Weibull model. To check, plot the synthetic data and the corresponding intended survival curve, in the way I used for the log-logistic model. Your exponential model seems OK.