r – Multi-State Cox Models: Different Formulas for Various Transitions Explored

cox-modelmarkov-processmulti-statersurvival

I'm using coxph from Therneau et al's survival package to model a multi-state system. Transitions include:

Baseline -> Disease
Disease -> Disease (re-hospitalization)
Baseline -> Death
Disease -> Death

(Death is an absorbing state.)

A key variable for predicting the Disease->Disease transition (re-hospitalization) is the number of days since the last hospitalization. (This is a time-varying model, and I increment time in 30-day increments until the next disease recurrence, final censoring, or death.)

As you can imagine, days since the last hospitalization doesn't even exist for people in the Baseline state. This is a problem for the survival package's multi-state model, because (at least as far as I understand its implementation), it wants to use the same variables to predict all of the transitions. (A slight caveat is that it can be forced to set the same coefficients for a variable for multiple transitions.)

Ideally, I'd like to be able to remove the days since the last hospitalization component of the model for the Baseline->Disease transition prediction. There are hacks to get around this, somewhat: I can set it to 0 (but then I'll get singularities, e.g., when I check cox.zph: Error in solve.default(imat, u): system is computationally singular). And sure, I can then set it to 0 +/- noise to prevent those singularities. But both of these approaches feel like technical workarounds that aren't strictly defensible.

Is there a way to remove a variable for a specific transition (or set of transitions)? If not, is there a generally accepted way of handling variables that only apply for some initial states and not others?

Best Answer

This is not a direct solution to your issue, but you could try using the mstate package.

After transforming data in long format with msprep, using expcovs you can create and add to the dataset transition-specific covariates, which you can then separately include in your model.

Considering the 4 transitions reported above, your Cox model should look something like this: coxph(Surv(Tstart, Tstop, status) ~ cov.1 + cov.2 + cov.3 + cov.4 + dayslasthosp.2 + dayslasthosp.4 + strata(trans), data=msdata), where .j identifies covariates specific to transition j. The covariate 'days since last hospitalization' (dayslasthosp) is included only for transitions 2 and 4.

The vignettes and tutorials available with the package are very informative, and guide you step-by-step through cases like this.

Related Solutions

Solved – Multi-State Numerical overflow using MSM package in R

I have had the same problem. In my case i solved it by first differentiate time within each subject and take the minimum value. That can be done using aggregate():

    diffs <- aggregate(cbind(minDiff=Years)~IDNumber, FUN=function(x) min(diff(x)),data=MD_Death2)

If you then merge this with your data you can easily exclude subjects in which there are very small differences in time between each observation.

    MD_Death2 <- merge(sampleData,diffs,by='IDNumber',all.x=T)

In my case, after excluding subjects that had less than 0.01 years between two observations (which in my case were likely errors) solved the problem. If still does not work you might also try increasing the fnscale option further, (and maximum iterations if necessary), eg pass

    control=list(fnscale=5000,maxit=500)

to msm().

R – Model Diagnostics for Multi-State Models in Survival::coxph

I don't do much multi-state modeling, so there could be easier ways to do the following. Software-specific questions are off-topic here, but there is at least one important statistical point.

The reason that you have more martingale residuals than data values is that the residuals are calculated for each possible transition per individual. Your data frame with 3373 rows is based on 2204 unique individuals. The possible transitions, as you note, are between states $1 \to 2$, $1 \to 3$, and $2 \to 3$. The list of residuals includes 2204 for each of the first 2 possible transitions, which all individuals might in principle have undergone. The last transition requires that one starts in state $2$, called "PR" in this data set. Only 1169 individuals ever entered that state. 2204+2204+1169=5577, the length of efit1$residuals.

So the martingale residuals for each transition are directly available, if you want them. You might have to pull them out yourself, however. The collapse argument to coxph.residuals() might have some usefulness in models with time-varying covariates, but its pooling information among all transitions for an individual doesn't seem to be what you want.

For other residuals, you might have to include the original design matrix in the coxphms object, by including x=TRUE as an argument to the coxph() call. Following a hint on the help page for residuals.coxph(), I got no warning when I did that and got the expected matrices of scaled Schoenfeld and dfbeta residuals. Again, you might have to pull out those residuals yourself.

The coefficients for each transition model are easily found by just typing efit1 at the command prompt. The code for survival:::print.coxph, in the loop starting if (inherits(x, "coxphms")), shows how to gather such information yourself if that printout isn't sufficient.

In terms of Markov property checking, the mstate package does that with log-rank tests based on properly formatted data and a formula. That package does its model fitting via coxph(). The other functions are mostly for formatting data appropriately for the package to interpret (msprep) and for displaying results of coxph() models that have been processed through its msfit() function. The mstate package might make it easier to handle complicated state transition matrices, and as the multi-state survival vignette says in Section 6.1:

One current disadvantage of the survival package is that the Aalen-Johansen curves from a multi-state coxph model currently do not include a variance estimate, whereas those from mstate do have a variance.

So you might be best off using the mstate package for this type of work anyway, as it complements rather than replaces what the survival package offers.

Best Answer

Related Solutions

Solved – Multi-State Numerical overflow using MSM package in R

R – Model Diagnostics for Multi-State Models in Survival::coxph

Related Question