Survival – Treat Censoring as a Competing Risk and Use Multinomial Logit

censoringcompeting-risksmultinomial logitsurvival

Suppose I am modeling survival with the hazard rate specified using logistic regression, and the model is adequate for the data. Now add censoring, and the model formulation becomes a bit more cumbersome and the implementation a bit more involved. That holds if we assume censoring to be independent of the survival process, but I guess gets even worse if we relax that assumption.

On the other hand, could we perhaps treat censoring as a competing risk and go for a multinomial logit model? Both interpretation and implementation would be pleasantly easy again. Moreover, I think we would not have to assume the survival and censoring processes to be independent.

Is it an OK approach, or am I missing something (perhaps some implicit assumptions that can be problematic)?

Best Answer

What's typically of interest in survival analysis is the process underlying the distribution of events in time. The tools of survival analysis are designed to deal with missing (censored) event times in a way that provides reliable estimates of the event process itself.

What you describe is a model of whether you observe an event or a censoring time. By itself it wouldn't directly describe the event process if censoring is informative. The review by Leung et al. provides an introduction the problems introduced by making incorrect assumptions about censoring.

Modeling the distribution of censoring times, however, can be a first step toward dealing with potentially informative censoring. As Tutz and Schmid explain in Chapter 4, inversely weighting cases with respect to their probabilities of having been censored over time (inverse probability of censoring weighting, IPCW) can "'reconstruct' the characteristics of the unknown [due to censoring] full data sample by using the weights" (page 89).

Although Tutz and Schmid present IPCW in the context of estimating prediction errors, in some circumstances it can correct for informative censoring. Robins and Finkelstein illustrate this in "Correcting for Noncompliance and Dependent Censoring in an AIDS Clinical Trial with Inverse Probability of Censoring Weighted (IPCW) Log-Rank Tests," Biometrics 56: 779-788 (2000).

Hernán and Robins devote much of their book "Causal Inference: What If" to the problems introduced by censoring. Chapters 8 and 12 explain the problems for analysis that doesn't explicitly involve time (e.g., continuous or binary outcomes when some individuals are lost to follow up); Chapters 21 and 22 cover situations where censoring is a function of time, as in survival analysis.

Related Solutions

Survival Analysis – Better Understanding of Censoring and Truncation

I think that the apparent discrepancy between the text and your professor has to do with the truncated (and thus missing) "observations" versus the implications for how you handle the data that you do have.

Yes, a truncated "observation" is one that is unavailable to the study because its value is out of range. But you don't have those truncated "observations" to work with at all. What you have is a sample of non-truncated observations. Wikipedia puts it nicely:

A truncated sample can be thought of as being equivalent to an underlying sample with all values outside the bounds entirely omitted, with not even a count of those omitted being kept.

Your professor's emphasis is on how you analyze the data that you do have. From that perspective in the example case, you have to treat the age values of the observations you have as left-truncated, as the sample provides no information about ages below the threshold. That applies to the entire sample at hand.

Section 5.3 of my edition of your text explains a standard situation that leads to right truncation: when you enroll participants into a study only after they have developed some disease. In that case, their times between some initiating cause (like an initial infection) to the event of developing overt disease provides no information about individuals who might have a longer time between initiating cause and overt disease. The example there is for individuals who developed AIDS following blood transfusions.

A mixed situation: time-varying covariates

In the provided example of a cutoff of > 60 years of age to be entered into a study, the entire sample needs to be treated as left-truncated if time = 0 is treated as date of birth. In other circumstances you need to evaluate each observation in your sample with respect to whether it needs to be treated as truncated or censored.

This can happen with time-varying covariates handled via a counting process, as in the R coxph() function. Say that an individual starts at time = 0 with a categorical covariate having level A and stays event-free until time = 5, at which time that covariate changes to level B, remains event-free until time = 7 when the covariate changes to level C, and has the event at time = 9 with covariate level still C. The data for that individual might be coded as follows:

start   stop covariate event
 0       5    A          0
 5       7    B          0
 7       9    C          1

Now you have to think about truncation and censoring for each time period for the individual. For the first time period you have simple right censoring at time = 5, as you are starting from the reference time = 0 and there is no event. It contains information with respect to a covariate value of A for the entire period starting at the time = 0 reference through time = 5, so there is no left truncation. The second time period is also right censored, here at time = 7, as there is no event then; it is treated as left truncated, as the data provide no information about a covariate level of B prior to time = 5. The third period similarly provides no information about a covariate level of C prior to time = 7 so it is left truncated, with the (uncensored) event at time = 9.

So I suppose it's best to think about truncation on a data-value by data-value basis: about what time periods do the data provide information? In some circumstances all the data values need to be considered truncated, as in the left-truncation example of the retirement home or the right-truncation when only those with the event are included in the sample. But in other situations you need to proceed more cautiously.

Best Answer

Related Solutions

Survival Analysis – Better Understanding of Censoring and Truncation

Related Question