Solved – Why do coxph() and survreg(, dist=”exponential”) NOT return the same coefficient (except for expected opposite sign) in R

cox-modelrsurvivalweibull distribution

If I understand correctly, the coefficient of a covariate X under a Weibull accelerated failure time (AFT) model is related to the log(hazard ratio) of the Cox proportional hazard (PH) model in the following way:

d1 = -c1/shape.gamma

where d1 is the AFT coefficient, c1 is the Cox PH coefficient log(hazard ratio), and shape.gamma is the shape parameter of the Weibull distribution. See for example slide 36 in this handout.

So for shape.gamma=1 (the Weibull distribution is then equal to the exponential distribution) d1 and c1 should be equal (except for sign).

My question is: Why, using the survival package in R, do coxph() and survreg(, dist="exponential") NOT return the same coefficient estiamtes (in absolute value)? Below is the code I used to check this. I first simulate uncensored data using the exponential AFT parametrization and, as expected, survreg() returns 'on average' d1=0.5 (the value I used to simulate the data). So far so good, However, the coefficient returned by coxph is quite different and seems to depend very much on the scale.lambda parameter of the exponential distribution. How does this mismatch relate to the theoretical equivalence (in absolute value) of the coefficients of the exponential AFT model and the Cox PH model? What am I missing?

library(survival)
  # sample size
n            <- 1000
scale.lambda <- 2
d1           <- 0.5
ph.ests      <- c()
aft.ests     <- c()
X1           <- c(rep(0,n/2),rep(1,n/2))
for(i in 1:100){
 e        <- rexp(n,scale.lambda)
   # AFT model
 logT     <- -d1*X1 + e
 T        <- exp(logT)
   # no censoring
 event    <- rep(1,n) 
   # cox ph analysis
 m.ph     <- coxph(Surv(T,event)~X1)
   # exponential AFT analysis
 m.aft    <- survreg(Surv(T,event)~X1,dist="exponential")
   # collect X1 coefficients (on log scale)
 ph.ests  <- c(ph.ests,coef(m.ph))
 aft.ests <- c(aft.ests,coef(m.aft)[2])
}
  # summary of 100 replications
summary(aft.ests)
summary(ph.ests)

Best Answer

It has just been pointed out to me that I made a mistake in my simulation. So my theoretically based assumption was sound: under the exponential AFT model coxph() and survreg(,dist="exponential") should provide on average the same answer. However my implementation of the exponential AFT model in the R simulation was incorrect. I simulated:

logT     <- -d1*X1 + e
T        <- exp(logT)

However, under my assumptions, it is not logT that is exponentially distributed, but T. So instead, I should have replaced the above two lines in my code with:

T     <- exp(-d1*X1)*e

After this change, the code produced the results I expected.

Related Solutions

Solved – AFT model with time varying independent variables

In response to this part of the question:

Can survreg function of survival package handle combination of time - varying and fixed time independent variables?

the answer is no. However, both the flexsurvreg from the flexsurv package and the aftreg function from the eha can do this, and the syntax is very similar to that of the survival::surv function.

The discussion on this question may be useful.

R – Relationship between Gumbel and Weibull Distributions and Survival Analysis

The confusion comes from competing definitions of "Gumbel distribution" and competing parameterizations of the Weibull distribution.

(1) It might be best to avoid the term "Gumbel distribution" because it has different interpretations.

One is a maximum extreme value distribution, the definition used in Wikipedia. "This article uses the Gumbel distribution to model the distribution of the maximum value." (Emphasis in original.)

Another is a minimum extreme value distribution, the definition provided by Wolfram. "In this work, the term 'Gumbel distribution' is used to refer to the distribution corresponding to a minimum extreme value distribution." (Emphasis added.) That is used by Mathematica for its GumbelDistribution, which calls the Wikipedia maximum extreme value version the ExtremeValueDistribution.

It's the minimum extreme value version that provides the "standard result" for the association between Weibull and Gumbel distributions. As you used the maximum extreme value version, you got the result that you found.

(2) Continuing from point (1), to make this work you have to alter (a) the relationship between $\alpha$ and $\beta$ to get a mean of 0, and (b) the CDF to match the minimum extreme value Gumbel.

(a) The mean of the minimum extreme value version is $\alpha - \gamma \beta$, where $\gamma$ is Euler's gamma, with $\alpha$ and $\beta$ as represented in the question. That's different from $\alpha + \gamma \beta$ for the maximum extreme value version, as used in the question.

(b) The $q$th quantile (inverse CDF) of the minimum extreme value version is:

$$\alpha +\beta \log (-\log (1-q)).$$

The inverse CDF used in the question's code is for the maximum extreme value version.

I haven't yet done those replacements in the code, but I suspect that (absent other problems) all with then be OK.

(3) The question of "exactly what distribution specification R is using when fitting a Weibull distribution" is not well specified.

R packages can differ in parameterizations, and the same function might use different parameterizations depending on the arguments in the function call. This page provides some examples. Notably, as the manual page for the survreg() function in the survival package explains:

There are multiple ways to parameterize a Weibull distribution. The survreg function embeds it in a general location-scale family, which is a different parameterization than the rweibull function, and often leads to confusion.

survreg's scale  =    1/(rweibull shape)
survreg's intercept = log(rweibull scale)

I don't see any way around these types of confusions, except to be extremely careful in reading specific definitions and manual pages.

Best Answer

Related Solutions

Solved – AFT model with time varying independent variables

R – Relationship between Gumbel and Weibull Distributions and Survival Analysis

Related Question