Survival Analysis – Parametrization of ‘survreg’ in the ‘survival’ R Package

parameterizationrregressionsurvival

I am fitting AFT models using the command survreg from the R package survival. I want to do some further plots of the hazard function but I do not understand what is the parametrization of the AFT model used in this package. The help of this command only indicates:

Description

Fit a parametric survival regression model. These are
location-scale models for an arbitrary transform of the time variable;
the most common cases use a log transformation, leading to accelerated
failure time models.

However, I would like to know if the parametrization in terms of the hazard function is

$$h(t\exp(x^T\beta))\exp(x^T\beta).$$

or

$$h(t\exp(-x^T\beta))\exp(-x^T\beta).$$

but this is not indicated in the manual

https://cran.r-project.org/web/packages/survival/survival.pdf

Best Answer

This might be helpful: https://cran.r-project.org/web/packages/SurvRegCensCov/vignettes/weibull.pdf

Quoting from the first page:

Weibull accelerated failure time regression can be performed in R using the survreg function. The results are not, however, presented in a form in which the Weibull distribution is usually given. Accelerated failure time models are usually given by $\log T=Y=\mu+\alpha^\top z +\sigma W$ ,where $z$ are set of covariates, and $W$ has the extreme value distribution. Given transformations $\gamma = 1/\sigma$, $\lambda =\exp(−\mu/\sigma)$, $\beta =−\alpha/\sigma$, we have a Weibull model with baseline hazard of $h(x|z) = (\gamma \lambda t^{\gamma−1}) exp(\beta^\top z)$.

So, in the AFT model as parametrized in the survreg function, larger values of $\alpha^\top z$ correspond to an increase in expected survival time (longer survival), whereas in the Cox model as parametrized in coxph, larger values of $\beta^\top z$ correspond to an increase in the hazard (shorter survival), and when the AFT error follows the Weibull distribution, they are related by $\beta^\top z = -(\alpha^\top z)/ \sigma$

To confirm directly that the AFT model in R uses $\alpha^\top z$, compare the linear predictors from a fitted AFT model using predict.survreg to the linear predictors calculated 'by hand':

library(survival); library(tidyverse);
# fit aft to lung data using defaults
lung_model_aft <- survreg(Surv(time, status) ~ age + sex + factor(ph.ecog), lung)

all(near(predict(lung_model_aft, type = 'lp'), model.matrix(lung_model_aft) %*% lung_model$coefficients))
# TRUE

We can also check for similarity to the fitted Cox model, but we should only expect them to be similar, not identical, since the Cox model estimates the baseline hazard nonparametrically but the AFT model estimates it parametrically:

- coef(lung_model_aft)[-1] / lung_model_aft$scale
#             age              sex factor(ph.ecog)1 
#     0.009938741     -0.541893431      0.397749179 
#factor(ph.ecog)2 factor(ph.ecog)3 
#     0.906020113      1.857645211 
coef(lung_model_cox)
#             age              sex factor(ph.ecog)1 
#      0.01079468      -0.54583052       0.41004801 
#factor(ph.ecog)2 factor(ph.ecog)3 
#      0.90330301       1.95454338 

Calculating the hazard The general expression for the hazard function at time $x$ is $h(x) = f_T(x)/\Pr(T > x)$, where $f_T(x)$ is the pdf of $T$ at $x$. When $T = \exp\{\mu + \alpha^\top z +\sigma W\}$, then T's distribution is determined by the distribution of $W$. And when $W$ is the extreme value distribution, then $T$ given $z$ is Weibull, and the hazard function is as given above.

The form of the hazard will be different when $W$ is differently distributed. For example, when $W$ is standard normal, then $T$ given $z$ is log-normal. So, $f_T(x) = \frac 1 {x\sigma\sqrt{2\pi}}\ \exp\left(-\frac{\left(\ln x-\mu -\alpha^\top z\right)^2}{2\sigma^2}\right)$ and $\Pr(T > x) = 1 - \Phi\left( \frac{\ln x - \mu - \alpha^\top z} \sigma \right)$, and $$h(x|z) = \dfrac{\frac 1 {x\sigma\sqrt{2\pi}}\ \exp\left(-\frac{\left(\ln x-\mu -\alpha^\top z\right)^2}{2\sigma^2}\right)}{1 - \Phi\left( \frac{\ln x - \mu - \alpha^\top z} \sigma \right)}$$

For other distributions of $W$, the form of the hazard will be different yet. As you may know, the choice of Weibull $T$ (equivalently, Extreme Value $W$) is the only choice that is both an AFT model as well as a proportional hazards model.

I'm not aware of functionality in R to automatically calculate and extract the hazard curves for all observations. So you would likely need to write up a function yourself.

Related Question