If I understand correctly, the coefficient of a covariate X
under a Weibull accelerated failure time (AFT) model is related to the log(hazard ratio)
of the Cox proportional hazard (PH) model in the following way:
d1 = -c1/shape.gamma
where d1
is the AFT coefficient, c1
is the Cox PH coefficient log(hazard ratio)
, and shape.gamma
is the shape parameter of the Weibull distribution. See for example slide 36 in this handout.
So for shape.gamma=1
(the Weibull distribution is then equal to the exponential distribution) d1
and c1
should be equal (except for sign).
My question is: Why, using the survival package in R, do coxph()
and survreg(, dist="exponential")
NOT return the same coefficient estiamtes (in absolute value)? Below is the code I used to check this. I first simulate uncensored data using the exponential AFT parametrization and, as expected, survreg()
returns 'on average' d1=0.5
(the value I used to simulate the data). So far so good, However, the coefficient returned by coxph is quite different and seems to depend very much on the scale.lambda
parameter of the exponential distribution. How does this mismatch relate to the theoretical equivalence (in absolute value) of the coefficients of the exponential AFT model and the Cox PH model? What am I missing?
library(survival)
# sample size
n <- 1000
scale.lambda <- 2
d1 <- 0.5
ph.ests <- c()
aft.ests <- c()
X1 <- c(rep(0,n/2),rep(1,n/2))
for(i in 1:100){
e <- rexp(n,scale.lambda)
# AFT model
logT <- -d1*X1 + e
T <- exp(logT)
# no censoring
event <- rep(1,n)
# cox ph analysis
m.ph <- coxph(Surv(T,event)~X1)
# exponential AFT analysis
m.aft <- survreg(Surv(T,event)~X1,dist="exponential")
# collect X1 coefficients (on log scale)
ph.ests <- c(ph.ests,coef(m.ph))
aft.ests <- c(aft.ests,coef(m.aft)[2])
}
# summary of 100 replications
summary(aft.ests)
summary(ph.ests)
Best Answer
It has just been pointed out to me that I made a mistake in my simulation. So my theoretically based assumption was sound: under the exponential AFT model coxph() and survreg(,dist="exponential") should provide on average the same answer. However my implementation of the exponential AFT model in the R simulation was incorrect. I simulated:
However, under my assumptions, it is not logT that is exponentially distributed, but T. So instead, I should have replaced the above two lines in my code with:
After this change, the code produced the results I expected.