Since the seasonal dummy variables are static by nature, and their coefficients clearly vary with the time variable, how much does it matter?
The value that you get is a form of average over time. Unfortunately as you naturally have more cases in time-to-event analyses early on, you can't simply say that the effect is balanced throughout time. From my experience the initial period has a much heavier impact on the estimate than the later and it
I get that statistically, it means something, but does the violation of PH assumption invalidate the (intuitively appealing) result that non-response is more likely to happen in the summer and winter?
Again, it probably doesn't but you can't be sure until you've checked. Something is definitely happening over time and you should at least have a look at the residual plot (plot(cox.zph(...))
). It isn't entirely surprising that you have a problem with the PH since seasons are part of the time variable and there will be situations where early summer and late spring are similar.
If so, is there a way to handle this so that the PH assumption is not violated? I know about using the tt transform, but I can't seem to figure out the exact form for the function.
The tt
transform is tricky to use with big data. It explodes the matrix and can get a little messy, e.g. if you modify the lung example in the survival package:
library(survival)
coxph(Surv(time, status) ~ ph.ecog + tt(age), data=lung,
tt=function(x,t,...) {
print(length(x))
pspline(x + t/365.25)
})
It prints 15809 while there are only 228 rows in the original dataset. The principle of the tt()
is that it feeds the variables into the transformation function where you are free to use time any way you wish. Note that you can also have different transformation functions depending for each variable:
library(survival)
coxph(Surv(time, status) ~ tt(ph.ecog) + tt(age), data=lung,
tt=list(
function(x,t,...) {
cbind(x, x + t/365.25, (x + t/365.25)^2)
},
function(x,t,...) {
pspline(x + t/365.25)
}),
x=T,
y=T) -> fit
head(fit$x)
Gives:
tt(ph.ecog)x tt(ph.ecog) tt(ph.ecog) tt(age)1 tt(age)2 tt(age)3 tt(age)4
6 1 3.4 11.7 0 0 0 0.000
3 0 2.4 5.8 0 0 0 0.020
38 1 3.4 11.7 0 0 0 0.000
5 0 2.4 5.8 0 0 0 0.000
6.1 1 3.2 10.4 0 0 0 0.000
3.1 0 2.2 5.0 0 0 0 0.026
tt(age)5 tt(age)6 tt(age)7 tt(age)8 tt(age)9 tt(age)10 tt(age)11 tt(age)12
6 0.00 0.00000 0.000 0.0052 0.359 0.58 0.053 0
3 0.48 0.48232 0.021 0.0000 0.000 0.00 0.000 0
38 0.00 0.00087 0.266 0.6393 0.094 0.00 0.000 0
5 0.03 0.51933 0.437 0.0136 0.000 0.00 0.000 0
6.1 0.00 0.00000 0.000 0.0078 0.388 0.56 0.044 0
3.1 0.50 0.45457 0.016 0.0000 0.000 0.00 0.000 0
I therefore try to avoid this solution and use the time-split approach that I wrote about here and in my answer to my own question.
You can compare Cox regression models (coxph
) in R
with plrtest
which is partial likelihood ratio test for non-nested coxph
models:
require("survival")
require("nonnestcox") #github.com/thomashielscher/nonnestcox
pbc <- subset(pbc, !is.na(trt))
mod1 <- coxph(Surv(time, status==2) ~ age, data=pbc, x=T)
mod2 <- coxph(Surv(time, status==2) ~ age + albumin + bili + edema + protime, data=pbc, x=T)
mod3 <- coxph(Surv(time, status==2) ~ age + log(albumin) + log(bili) + edema +
log(protime), data=pbc, x=T)
plrtest(mod3, mod2, nested=F) # non-nested models
plrtest(mod3, mod1, nested=T) # nested models
Best Answer
Thanks for trying those functions. I believe that both metrics you mentioned are excellent in this context. This is useful for any model that gives rise to Wald statistics (which is virtually all models) although likelihood ratio $\chi^2$ statistics would be even better (but more tedious to compute).
You can use the bootstrap to get confidence intervals for the ranks of variables computed these ways. For the example code type
?anova.rms
.All this is related to the "adequacy index". Two papers using the approach that have appeared in the medical literature are http://www.citeulike.org/user/harrelfe/article/13265566 and http://www.citeulike.org/user/harrelfe/article/13263849 .