Hello I am learning about survival analysis and introduced to parametric models with survreg
from the survival
package in R. In this process I was introduced to the idea of different ways to parametrize distributions. I found this topic very confusing so I am trying to wrap my head around it though examples.
Exponential Distribution
set.seed(123)
size <- 1000
deathtime <- rexp(size, rate = 1)
exp.fit <- survreg(Surv(deathtime) ~ 1, dist = "exponential")
exp.fit$icoef #paramter estimates
The intercept of this model is the ln(hazard) or ln($\lambda$) so if I wanted to plot the survival function it would look like this
$ \lambda = exp(intercept)$
$S(t) = exp(-\lambda t)$
I am pretty sure this is correct but correct me if I am wrong.
Weibull Distribution
Now this where things get more confusing for me after this point.
set.seed(123)
size <- 1000
deathtime <- rweibull(size, shape = 3, scale = 2)
wei.fit <- survreg(Surv(deathtime) ~ 1, dist = "weibull")
wei.fit$icoef
The coefficients of the weibull fit are the intercept and log(scale). When I simulated the weibull data I have a shape or $\alpha$ = 3 and scale or $\beta$ = 2. If I wanted to plot the survival function I would need to do this:
$ \beta = exp(intercept)$
$ \alpha = 1/exp(log(scale)) $ I think it is weird that the scale gives the shape parameter
$ S(t) = exp(-(t/\alpha)^\beta) $ Is that correct?
Log-Logistic
This is where I have the limits of my knowledge that I need the most clarification.
loglog.fit <- survreg(Surv(deathtime) ~ 1, dist = "loglogistic")
Fitting the log logistic distribution also provides the intercept and log(scale). If I would like to plot the survival function would this be correct?
$ \beta = exp(intercept) $
$ \alpha = 1/exp(log(scale)) $
$ S(t) = [1 + (t/\alpha)^\beta]^{-1} $ Formula from wiki
Please let me know if my understanding is correct on using the parameters of the survreg
function.
Best Answer
Parameterizations can be tricky, and get even more complicated when you move from the the logarithmic time scale, often used to illustrate a parametric model, to an absolute time scale. I find myself re-learning each time I try to answer a question like this.
I find these course notes to be a helpful, succinct summary. They cover several models of the general form
$$\log T = \alpha + \sigma W ,$$
where $\alpha$ is a "location" parameter, $\sigma$ is a "scale" parameter, and $W$ has a specified probability distribution. For a log-logistic model, $W$ is standard logistic, with random values given by
rlogis()
in R.The trick is to find how to express the parameters of a log-logistic survival model in terms of $\alpha$ and $\sigma$ above. Those course notes show that the survival function $S(t)$ for a log-logistic model takes the form
$$ S(t) = \frac{1}{1+(\lambda t)^p},$$
where $\alpha = - \log \lambda$ and $\sigma = 1/p$. Those two relationships between parameters in the $\log T$ scale and the $T$ scale hold for many model types discussed in the course notes.
To check this and see what values
survreg()
returns for coefficients, generate some random log-logistic survival times, following your general approach. I did this for $\lambda = 2$ and $p = 3$, plugging into the formula for $\log T$, then exponentiating:You can then see how the reported
(Intercept)
andLog(scale)
values correspond to the parameters of the model. Note that the(Intercept)
is close to $-\log(2)$, or $\alpha$ in the formula for $\log T$ when $\lambda = 2$. ForLog(scale)
, note that the correspondingscale
value is close to $1/3$, or $\sigma$ in the formula for $\log T$ when $p=3$.If you plot the synthetic survival data and overlay the intended log-logistic survival function in the parameterization I used:
you will see that they agree well.
So for a model of the above general form for $\log T$,
(Intercept)
is the "location" parameter $\alpha$ andscale
is the "scale" parameter.What gets confusing, particularly for Weibull models, is that the
rweibull()
parameterization differs from the used bysurvreg()
with even a switching of names.* From the manual page forsurvreg.distributions
:So you can't count on the word "scale" to mean the same thing, even in a single software package. Combine that with the different ways to parameterize the underlying distributions (what I used from the course notes for log-logistic differs from what you got from Wikipedia) and you have a recipe for massive confusion. I'd recommend avoiding the use of words "shape" or "scale" or "location" to describe the parameters and just use symbols in equations, to be most specific about just what you mean.
*I think that you have $\alpha$ and $\beta$ interchanged in your formulas for your Weibull model. To check, plot the synthetic data and the corresponding intended survival curve, in the way I used for the log-logistic model. Your exponential model seems OK.