R – Right Censored Survival Analysis with Interval Data in R

interval-censoringrsurvival

How do you perform a survival analysis on a data set that is right censored (i.e. some samples are removed before failure) where the measurement points are non-regular discreet intervals (i.e. failure between Left and Right or sometimes not at all).

Example data set:

| sample | Left (hours) | Right (hours) | Fail/Censored |
| A      | 10           | 20            | F             |
| B      | 40           | 60            | F             |
| C      | 40           | 60            | C             |        

I am comfortable using R, but do not have enough experience with this branch of statistics. I would appreciate if someone could provide an R example or point me in the correct direction.

Best Answer

As implied by the tag, this is referred to as interval censoring. In this scenario, for each observation, we have a pair of values that I call observation intervals: $(L_i, R_i]$, where $L_i$ represents a lower bound on the true value of interest and $R_i$ presents an upper bound on the true value of interest. For example, suppose you go to a dentist at age 9 and have no cavities. You go again at age 12 and have one cavity. Then all you know about your "age at first cavity" is that it is in the interval $(9,12]$.

Note: you will need to slightly change your data to put it in this format. For example, you have the interval $(40, 60]$ for subject C but you also state that they are right censored. I assume this means that we know that for subject C, at time 60 the event of interest had not occurred yet. In that case, you should represent their observation interval as $(60, \infty)$.

There exists a variety of tools for analyzing interval censoring data. One of the most basic tools is the non-parametric maximum likelihood estimator (NPMLE). This is basically an extension of the Kaplan Meier curves that allows for interval censoring. For performing hypothesis testing to compare two groups, the log-rank statistics have been generalized to allow for interval censoring. Finally, survival regression models (proportional hazards, accelerated failure time and proportional odds, to name few) can be used.

In R, the NPMLE and regression models can be found in my icenReg package. The log-rank statistic can be found in the interval package.

Related Question