I am trying to fit a multi-state model using the coxph
function from the survival
package in R and I want to perform some model diagnostics. I can use the following code to fit an example of a multi-state model:
require(mstate)
require(survival)
data(ebmt3, package = "mstate")
temp = subset(ebmt3, select = -c(prtime, prstat, rfstime, rfsstat))
edata = tmerge(
temp,
ebmt3,
id,
rstat = event(rfstime, rfsstat),
pstat = event(prtime, prstat),
priorpr = tdc(prtime)
)
edata$event = with(edata, factor(pstat+2*rstat,0:2,labels=c("censor","PR","RelDeath")))
levels(edata$drmatch) = c("match","mismatch")
efit1 = coxph(Surv(tstart, tstop, event) ~ dissub+age+drmatch+tcd,id=id,data=edata,ties="breslow")
This gives three models for the $1 \to 2$, $1 \to 3$, and $2 \to 3$ transitions. From my understanding, each model for the transition $i \to j$ should (1) satisfy the proportional hazards assumption and (2) satisfy the Markov property, and I should also check for (3) influential points and (4) linearity. In a standard $\text{alive} \to \text{death}$ survival analysis model I'd check (1) using cox.zph
, (3) using the dfbeta residuals and (4) by making plots of the martingale residuals from the null model against the predictors I'm interesting in using.
My questions then are:
- Is there a simple way to do this within the
survival
package for a multi-state model, or a way to extract the individual models to perform the required analyses? When trying to useresiduals(efit1, type = "martingale")
the length of the resulting vector is greater than the number of observations in the original data, and thecollapse
argument inresiduals.coxph
is not clear. Similarly, usingresiduals(efit1, type = "dfbetas")
says that theresiduals method for multistate coxph objects is incomplete
. Usingcox.zph
seems to work but it would be nicer to apply onto the individual models. - Is there a way within the
survival
package to perform a hypothesis test for the Markov property? There's the functionMarkovTest
from themstate
package, but that's only for models built with themstate
functions rather thancoxph
.
Best Answer
I don't do much multi-state modeling, so there could be easier ways to do the following. Software-specific questions are off-topic here, but there is at least one important statistical point.
The reason that you have more martingale residuals than data values is that the residuals are calculated for each possible transition per individual. Your data frame with 3373 rows is based on 2204 unique individuals. The possible transitions, as you note, are between states $1 \to 2$, $1 \to 3$, and $2 \to 3$. The list of residuals includes 2204 for each of the first 2 possible transitions, which all individuals might in principle have undergone. The last transition requires that one starts in state $2$, called "PR" in this data set. Only 1169 individuals ever entered that state. 2204+2204+1169=5577, the length of
efit1$residuals
.So the martingale residuals for each transition are directly available, if you want them. You might have to pull them out yourself, however. The
collapse
argument tocoxph.residuals()
might have some usefulness in models with time-varying covariates, but its pooling information among all transitions for an individual doesn't seem to be what you want.For other residuals, you might have to include the original design matrix in the
coxphms
object, by includingx=TRUE
as an argument to thecoxph()
call. Following a hint on the help page forresiduals.coxph()
, I got no warning when I did that and got the expected matrices of scaled Schoenfeld and dfbeta residuals. Again, you might have to pull out those residuals yourself.The coefficients for each transition model are easily found by just typing
efit1
at the command prompt. The code forsurvival:::print.coxph
, in the loop startingif (inherits(x, "coxphms"))
, shows how to gather such information yourself if that printout isn't sufficient.In terms of Markov property checking, the
mstate
package does that with log-rank tests based on properly formatted data and a formula. That package does its model fitting viacoxph()
. The other functions are mostly for formatting data appropriately for the package to interpret (msprep
) and for displaying results ofcoxph()
models that have been processed through itsmsfit()
function. Themstate
package might make it easier to handle complicated state transition matrices, and as the multi-state survival vignette says in Section 6.1:So you might be best off using the
mstate
package for this type of work anyway, as it complements rather than replaces what thesurvival
package offers.