R – Model Diagnostics for Multi-State Models in Survival::coxph

cox-modeldiagnosticrresidualssurvival

I am trying to fit a multi-state model using the coxph function from the survival package in R and I want to perform some model diagnostics. I can use the following code to fit an example of a multi-state model:

require(mstate)
require(survival)
data(ebmt3, package = "mstate")
temp = subset(ebmt3, select = -c(prtime, prstat, rfstime, rfsstat))
edata = tmerge(
  temp,
  ebmt3,
  id,
  rstat = event(rfstime, rfsstat),
  pstat = event(prtime, prstat),
  priorpr = tdc(prtime)
)
edata$event = with(edata, factor(pstat+2*rstat,0:2,labels=c("censor","PR","RelDeath")))
levels(edata$drmatch) = c("match","mismatch")
efit1 = coxph(Surv(tstart, tstop, event) ~ dissub+age+drmatch+tcd,id=id,data=edata,ties="breslow")

This gives three models for the $1 \to 2$, $1 \to 3$, and $2 \to 3$ transitions. From my understanding, each model for the transition $i \to j$ should (1) satisfy the proportional hazards assumption and (2) satisfy the Markov property, and I should also check for (3) influential points and (4) linearity. In a standard $\text{alive} \to \text{death}$ survival analysis model I'd check (1) using cox.zph, (3) using the dfbeta residuals and (4) by making plots of the martingale residuals from the null model against the predictors I'm interesting in using.

My questions then are:

Is there a simple way to do this within the survival package for a multi-state model, or a way to extract the individual models to perform the required analyses? When trying to use residuals(efit1, type = "martingale") the length of the resulting vector is greater than the number of observations in the original data, and the collapse argument in residuals.coxph is not clear. Similarly, using residuals(efit1, type = "dfbetas") says that the residuals method for multistate coxph objects is incomplete. Using cox.zph seems to work but it would be nicer to apply onto the individual models.
Is there a way within the survival package to perform a hypothesis test for the Markov property? There's the function MarkovTest from the mstate package, but that's only for models built with the mstate functions rather than coxph.

Best Answer

I don't do much multi-state modeling, so there could be easier ways to do the following. Software-specific questions are off-topic here, but there is at least one important statistical point.

The reason that you have more martingale residuals than data values is that the residuals are calculated for each possible transition per individual. Your data frame with 3373 rows is based on 2204 unique individuals. The possible transitions, as you note, are between states $1 \to 2$, $1 \to 3$, and $2 \to 3$. The list of residuals includes 2204 for each of the first 2 possible transitions, which all individuals might in principle have undergone. The last transition requires that one starts in state $2$, called "PR" in this data set. Only 1169 individuals ever entered that state. 2204+2204+1169=5577, the length of efit1$residuals.

So the martingale residuals for each transition are directly available, if you want them. You might have to pull them out yourself, however. The collapse argument to coxph.residuals() might have some usefulness in models with time-varying covariates, but its pooling information among all transitions for an individual doesn't seem to be what you want.

For other residuals, you might have to include the original design matrix in the coxphms object, by including x=TRUE as an argument to the coxph() call. Following a hint on the help page for residuals.coxph(), I got no warning when I did that and got the expected matrices of scaled Schoenfeld and dfbeta residuals. Again, you might have to pull out those residuals yourself.

The coefficients for each transition model are easily found by just typing efit1 at the command prompt. The code for survival:::print.coxph, in the loop starting if (inherits(x, "coxphms")), shows how to gather such information yourself if that printout isn't sufficient.

In terms of Markov property checking, the mstate package does that with log-rank tests based on properly formatted data and a formula. That package does its model fitting via coxph(). The other functions are mostly for formatting data appropriately for the package to interpret (msprep) and for displaying results of coxph() models that have been processed through its msfit() function. The mstate package might make it easier to handle complicated state transition matrices, and as the multi-state survival vignette says in Section 6.1:

One current disadvantage of the survival package is that the Aalen-Johansen curves from a multi-state coxph model currently do not include a variance estimate, whereas those from mstate do have a variance.

So you might be best off using the mstate package for this type of work anyway, as it complements rather than replaces what the survival package offers.

Best Answer

Related Solutions

Solved – R: How to fit a coxph model with 2 categorical predictor variables that violate assumptions? Coefficient or Covariate

Multi-State Models – Extracting Transition-Specific Estimates from a Parametric Multi-State Survival Model

Related Question