Survival Analysis – Interpreting Output of Weibull Accelerated Failure Time Model

interpretationself-studysurvivalweibull distribution

In this case study I have to assume a baseline Weibull distribution, and I'm fitting an Accelerated Failure Time model, which will be interpreted by me later on regarding both hazard ratio and survival time.

The data looks like this.

head(data1.1)

TimeSurv IndSurv Treat Age
1     6 days       1     D  27
2    33 days       1     D  43
3   361 days       1     I  36
4   488 days       1     I  54
5   350 days       1     D  49
6   721 days       1     I  49
7  1848 days       0     D  32
8   205 days       1     D  47
9   831 days       1     I  24
10  260 days       1     I  38

I'm fitting a model using the function Weibullreg() in R. The survival function is built reading TimeSurv as the time measures and IndSurv as the indicator of censoring. The covariates considered are Treat and Age.

My issue deals with understanding the output properly:

wei1 = WeibullReg(Surv(TimeSurv, IndSurv) ~ Treat + Age, data=data1.1)
wei1


$formula
Surv(TimeSurv, IndSurv) ~ Treat + Age

$coef
            Estimate           SE
lambda  0.0009219183 0.0006803664
gamma   0.9843411517 0.0931305471
TreatI -0.5042111027 0.2303038312
Age     0.0180225253 0.0089632209

$HR
              HR       LB       UB
TreatI 0.6039819 0.384582 0.948547
Age    1.0181859 1.000455 1.036231

$ETR
             ETR        LB        UB
TreatI 1.6690124 1.0574337 2.6343045
Age    0.9818574 0.9644488 0.9995801

$summary

Call:
survival::survreg(formula = formula, data = data, dist = "weibull")
               Value Std. Error     z      p
(Intercept)  7.10024    0.41283 17.20 <2e-16
TreatI       0.51223    0.23285  2.20  0.028
Age         -0.01831    0.00913 -2.01  0.045
Log(scale)   0.01578    0.09461  0.17  0.868

Scale= 1.02 

Weibull distribution
Loglik(model)= -599.1   Loglik(intercept only)= -604.1
    Chisq= 9.92 on 2 degrees of freedom, p= 0.007 
Number of Newton-Raphson Iterations: 5 
n= 120

I don't really get how Scale = 1.02 and log(scale) = 0.015, and if the p-value of this log(scale) is a big non-signfificant one, from how the documentation of the function shows some conversions it makes, am I to assume that the values of the alphas are also not to be trusted (considering they were reached using the scale value)?

Best Answer

Many (including me) get confused by the different ways to define the parameters of a Weibull distribution, particularly since the standard R Weibull-related functions in the stats package and the survreg() parametric fitting function in the survival package use different parameterizations.

The manual page for the R Weibull-related functions in stats says:

The Weibull distribution with shape parameter $a$ and scale parameter $b$ has density given by $$\frac{a}{b}\left(\frac{x}{b}\right)^{a-1}e^{-(x/b)^{a}}$$ for $x$ > 0.

That's called the "standard parameterization" on the Wikipedia page (where they use $k$ for shape and $\lambda$ for scale).

The survreg() function uses a different parameterization, with differences explained on its manual page:

There are multiple ways to parameterize a Weibull distribution. The survreg function embeds it in a general location-scale family, which is a different parameterization than the rweibull function, and often leads to confusion.

survreg's scale = 1/(rweibull shape)

survreg's intercept = log(rweibull scale).

The WeibullReg() function effectively takes the result from survreg() and expresses the results in terms of the "standard parameterization."

There is a potential confusion, however, as the $summary of the object produced by WeibullReg is "the summary table from the original survreg model." (Emphasis added.) So what you have displayed in the question includes results for both parameterizations.

That dual representation of the results helps explain what's going on.

Starting from the bottom, the survreg value of scale is the reciprocal of the "standard parameterization" value of shape. The "standard" shape parameter is called gamma in the WeibullReg $formula output near the top of your output. The value for gamma is 0.98434, with a reciprocal of 1.0159, rounding to the value of 1.02 shown as Scale in the last line of your output. The natural logarithm of 1.0159 is 0.01578, shown as Log(scale) in the next-to-last line. Those last lines of your output, remember, are based on the survreg definition of scale.

The p-value for that Log(scale) is indeed very high. But that just means that the value of Log(scale) is not significantly different from 0, or that the scale itself (as defined in survreg) is not different from 1. That has nothing to do with the hazard ratios and so forth for the covariates. It just means that the baseline survival curve of your Weibull model can't be statistically distinguished from a simple exponential survival curve, which would have exactly a value of 1 for survreg scale or "standard" shape and a constant baseline hazard over time. So there is nothing to distrust about your results on that basis.

Related Solutions

Solved – Equivalence of Poisson and Weibull PH regression in a survival setting

Poisson regression is not equivalent to a a Weibull survival model. Instead, it's assuming an exponential distribution, where the baseline hazard is not only proportional, but constant.

The Weibull model relaxes this assumption somewhat. While the two models will give you the same answer if indeed the underlying survival distribution is exponential, as the exponential distribution is a special case of the weibull distribution, they need not under other circumstances.

It appears one such circumstance is your data.

R – Relationship between Gumbel and Weibull Distributions and Survival Analysis

The confusion comes from competing definitions of "Gumbel distribution" and competing parameterizations of the Weibull distribution.

(1) It might be best to avoid the term "Gumbel distribution" because it has different interpretations.

One is a maximum extreme value distribution, the definition used in Wikipedia. "This article uses the Gumbel distribution to model the distribution of the maximum value." (Emphasis in original.)

Another is a minimum extreme value distribution, the definition provided by Wolfram. "In this work, the term 'Gumbel distribution' is used to refer to the distribution corresponding to a minimum extreme value distribution." (Emphasis added.) That is used by Mathematica for its GumbelDistribution, which calls the Wikipedia maximum extreme value version the ExtremeValueDistribution.

It's the minimum extreme value version that provides the "standard result" for the association between Weibull and Gumbel distributions. As you used the maximum extreme value version, you got the result that you found.

(2) Continuing from point (1), to make this work you have to alter (a) the relationship between $\alpha$ and $\beta$ to get a mean of 0, and (b) the CDF to match the minimum extreme value Gumbel.

(a) The mean of the minimum extreme value version is $\alpha - \gamma \beta$, where $\gamma$ is Euler's gamma, with $\alpha$ and $\beta$ as represented in the question. That's different from $\alpha + \gamma \beta$ for the maximum extreme value version, as used in the question.

(b) The $q$th quantile (inverse CDF) of the minimum extreme value version is:

$$\alpha +\beta \log (-\log (1-q)).$$

The inverse CDF used in the question's code is for the maximum extreme value version.

I haven't yet done those replacements in the code, but I suspect that (absent other problems) all with then be OK.

(3) The question of "exactly what distribution specification R is using when fitting a Weibull distribution" is not well specified.

R packages can differ in parameterizations, and the same function might use different parameterizations depending on the arguments in the function call. This page provides some examples. Notably, as the manual page for the survreg() function in the survival package explains:

There are multiple ways to parameterize a Weibull distribution. The survreg function embeds it in a general location-scale family, which is a different parameterization than the rweibull function, and often leads to confusion.

survreg's scale  =    1/(rweibull shape)
survreg's intercept = log(rweibull scale)

I don't see any way around these types of confusions, except to be extremely careful in reading specific definitions and manual pages.

Best Answer

Related Solutions

Solved – Equivalence of Poisson and Weibull PH regression in a survival setting

R – Relationship between Gumbel and Weibull Distributions and Survival Analysis

Related Question