Accelerated Failure Model – Finding a Suitable Distribution

accelerated-failure-timegamma distributionsurvivalweibull distribution

Can someone please answer these 3 questions related to AFT model?

  1. In Accelerated Failure Time (AFT) model, $S(t│β,x)=S_0 (exp(β^T x).t)$, where $S_0$ is the baseline survival function, does $S_0$ represent the survival function when there are no covariates, which we can get using the univariate survival time distribution?

  2. AFT can be represented as: $Y= log (T) = \beta^T \mathbf{x} + \epsilon$.
    I read that if $\epsilon$ has a Normal distribution, then baseline distribution $S_0$ is log normal. How to exactly determine the distribution of $Y$ based on $\epsilon$? And do we determine the distribution of $\epsilon$ based on distribution of $T$?

  3. Also from what I read, if $\epsilon$ has an extreme value distribution, then $S_0$ has a Weibull distribution. If my survival time $T$ (with no covariates) is distributed as $Gamma$, then can I still use $Weibull$ distribution for $Y$?

Best Answer

Question 1. The baseline survival function in a parametric survival model or a semi-parametric Cox model should be thought of as the survival function when all covariate values are at their reference levels. Just what constitutes those "reference levels" can depend on the parameterization used by the fitting software, so you should read the documentation carefully.

That is not the survival function when you simply ignore all the covariates and examine raw survival times, if that's what you mean by the "survival function when there are no covariates."

Question 2. These course notes provide a convenient, compact reference for that representation of a survival model. A bit more generally useful representation used there is:

$$Y = \log T = -x'\beta + \sigma W$$

incorporating a scale factor $\sigma$. The general answer to your question is that you can use the standard change-of-variables technique to go back and forth between the distributions of $W$ and $T$. Quoting from those notes (page 8):

the extreme value, generalized extreme value, normal or logistic [distribution for $W$ ... ] leads to Weibull, generalized gamma, log-normal or log-logistic models for T,

providing some standard examples.

Question 3. As indicated above, if $T$ has a gamma distribution then $W$ has a generalized extreme value distribution. For a standard gamma distribution, $\sigma = 1$ in the above equation, and $W$ has a density

$$f_w(w) = \frac{e^{kw-e^w}}{\Gamma (k)} $$

That reduces to the extreme value distribution associated with Weibull survival times only if $k=1$. Strictly, the answer to your question would be "no" if you knew that the baseline survival function was gamma-distributed with any other value for $k$. In particular, although a Weibull model fits a proportional hazards assumption, a gamma model will not--expect in that special case.

I'm reluctant to say that's the answer, however, because you seem to want to use a covariate-free survival function to estimate the baseline survival, and that's incorrect as discussed for Question 1. It's possible that a Weibull model would work adequately. One way to approach this problem is to examine the distribution of residuals from the above formula after you've fit a particular model form to estimate $\beta$, typically a vector, and the scalar $\sigma$ (e.g., assuming Weibull, generalized gamma, log-normal or log-logistic):

$$W = \frac{\log T + x'\beta}{\sigma} $$

and see how well it matches the corresponding expected distribution of residuals (e.g., extreme value, generalized extreme value, normal or logistic). See for example Harrell's Regression Modeling Strategies. Parametric survival modeling and this type of validation of an AFT model are covered in Chapter 18 of the textbook, and also in associated course notes (although the particular chapter can vary as the notes are revised).