Survival Analysis Methods – Estimating Survival Function Under Interval Censoring

interval-censoringsurvival

I'm trying to learn about methods for conducting survival analysis when the data consists of, for example, yearly tests. In other words, when it is discovered that an event has occurred, it's only known that it occurred sometime in the last year. Right censoring would also typically be present.

I'm finding it a little hard to piece together the concepts I'm researching. I have a few questions so I'll just list them out.

  1. What is the "thing" you're trying to overcome when you have interval censoring? Obviously the issue is that you don't know when exactly the event occurred, but what does a successful approach accomplish that, say, using a Kaplan-Meier estimate at the midpoint of the intervals doesn't? Particularly, what is the conceptual approach that the nonparametric estimators take?
  2. What are the current approaches to dealing with this (parametric, semi-parametric, nonparametric)? I know that the Turnbull method is used as a nonparametric estimator of the survival function. What is the advantage to a parametric method or a nonparametric method in this case?
  3. Do approaches change if the goal is to obtain the survival function vs. the hazard function vs. the cumulative hazard function? Although they are related it may sometimes be hard to convert between them, so if the goal is to have the hazard function in hand then what is a preferable approach?
  4. If you could say anything briefly about how covariates are incorporated for different methods, that would also be helpful.

Best Answer

Question 1. The "thing" you're trying to overcome is potential bias in the estimates. With your example of yearly tests probably only going over a few years, that could well be a problem. The review by Leung et al. (Annu. Rev. Public Health 1997, 18:83–104) has an example (Figure 5) of a similar situation in which a Kaplan-Meier estimate based on assuming that the event occurred at the time of the scheduled clinic visit showed substantial bias versus the Turnbull estimate.

In fairness, even with a survival time in days there is still some interval of uncertainty. The question is whether the interval is a large enough fraction of the times in question to lead to problems.

Question 2. Parametric methods work well, as there are defined contributions of interval-censored observations to the likelihood used for fitting the models. The question is whether you have chosen the correct parametric model. Turnbull's non-parametric maximum likelihood estimator is still used for non-parametric models.

For an informed introduction to issues in handling interval censoring, I'd recommend that you examine the vignette for the R icenReg package. Semi-parametric models with interval censoring require estimation of the baseline hazard as part of the modeling, unlike Cox models with only right censoring. Methods for semi-parametric proportional-hazard and proportional-odds models with interval-censored data are available, but they are more computationally intensive than fully parametric modeling.

Question 3. For continuous-time survival $S(t)$, $S(t)=\exp (-H(t))$, where $H(t)=\int_0^t h(\tau) d\tau$ is the cumulative hazard for the instantaneous hazard $h(t)$. There really is no problem interconverting. I don't think there's anything specific to interval censoring here. A parametric model gives a continuous function $S(t)$, but that won't be valid if you chose an incorrect parametric form. For semi-parametric models it's not clear that you can get confidence intervals for the baseline survival; the icenReg vignette says (page 7): "For the semi-parametric models...to our knowledge, even using the bootstrap error estimates for the baseline distribution is not valid." But that's a problem for all of the baselines (survival, hazard, cumulative hazard). If you can get 1 you can get them all.

Question 4. Covariates are included in semi-parametric and parametric models in the same functional forms as they are for analysis without interval censoring. For non-parametric approaches, you can draw separate non-parametric Turnbull survival curves for each of several groups defined by covariate values, similar to Kaplan-Meier curves. The interval package provides ways to test differences between interval-censored survival curves, described in the package vignette.

Related Question