Solved – Plotting interval censored follow-up time as a line chart

data visualizationrsurvival

I'm working on a survival analysis project where it would be useful to visualize everyone's followup time and event times. The data is made up of an ID, which of two possible events they had (A or B), and the time at which they had the event as well as whether or not they are interval censored. Non-censored individuals have two identical event times, t1 and t2, while censored individuals have t1 =/= t2, giving the lower and upper bound of their censored time.

This looks like so (all made up):

ID, eventA, eventB, t1, t2, censored
1, 0, 1, 7, 7, 0
2, 1, 0, 5, 5, 0 
3, 1, 0, 10, 10, 0
4, 0, 1, 4.5, 4.5, 0
5, 1, 0, 2, 8, 1

Where ID 5 is censored.

I'd like to produce a plot like this:

enter image description here

Essentially, make the X-axis ID, the Y axis time. Draw a line segment from 0 to t1. Then if they're censored, draw another line segment from t1 to t2. Presumably if they're uncensored they'll just overlap, which is fine. Then drop a marker at t2 for each type of event, say an X if they're eventA and O if they're eventB.

I presume there's a way to do this in R, but I haven't yet really wrapped my mind around graphing in R, and most of the examples for graphs in survival analysis I'm finding are KM curves. Anyone have an idea?

Best Answer

There must be many ways to make follow-up time plots with interval censored data, although a quick Google search only found this image in an overview of censoring, which looks a bit busy to my eye.

Just to give another perspective, here's an approach using the ggplot2 package.

require(ggplot2)

# Your example data
dat <- structure(list(ID = 1:5, eventA = c(0L, 1L, 1L, 0L, 1L), 
    eventB = c(1L, 0L, 0L, 1L, 0L), t1 = c(7, 5, 10, 4.5, 2), t2 = c(7, 5, 10, 4.5, 
    8), censored = c(0, 0, 0, 0, 1)), .Names = c("ID", "eventA", 
    "eventB", "t1", "t2", "censored"), class = "data.frame", row.names = c(NA, -5L))

# Create event variable
dat$event <- with(dat, ifelse(eventA, "A", "B"))

# Create id.ordered, which is a factor that is ordered by t2
# This will allow the plot to be ordered by increasing t2, if desired
dat$id.ordered <- factor(x = dat$ID, levels = order(dat$t2, decreasing = T))

# Use ggplot to plot data from dat object
ggplot(dat, aes(x = id.ordered)) + 
    # Plot solid line representing non-interval censored time from 0 to t1
    geom_linerange(aes(ymin = 0, ymax = t1)) + 
    # Plot line (dotted for censored time) representing time from t1 to t2
    geom_linerange(aes(ymin = t1, ymax = t2, linetype = as.factor(censored))) +  
    # Plot points representing event
    # The ifelse() function moves censored marker to middle of interval
    geom_point(aes(y = ifelse(censored, t1 + (t2 - t1) / 2, t2), shape = event), 
        size = 4) +
    # Flip coordinates
    coord_flip() + 
    # Add custom name to linetype scale, 
    # otherwise it will default to "as.factor(censored))"
    scale_linetype_manual(name = "Censoring", values = c(1, 2), 
        labels = c("Not censored", "Interval censored")) +
    # Add custom shape scale.  Change the values to get different shapes.
    scale_shape_manual(name = "Event", values = c(19, 15)) +
    # Add main title and axis labels
    opts(title = "Patient follow-up") + xlab("Patient ID") +  ylab("Days") + 
    # I think the bw theme looks better for this graph, 
    # but leave it out if you prefer the default theme
    theme_bw()

And the result:

Patient follow-up with interval censored data

When making graphs with line-type, color, size, etc. conditional on data, I find ggplot2 more intuitive than base graphics and even trellis, although trellis is much faster when plotting bigger data.

I'm not sure if it is preferred to place the event marker in the middle of the censored interval or at the end. I chose the middle here, to emphasize that the event did not necessarily occur near the end of follow-up.

Addendum

Once you decide on a standard plot, and if you find yourself making these plots often, it can be convenient to wrap it up in a function and use R's S3 object system to dispatch the plotting method with a call to the plot() generic. The following does not relate directly to the original question, but for the sake of other readers interested in adding methods to the plot() generic I'll include it here. First, wrap up the ggplot call into a plot method function:

plot.interval.censored <- function(x, title = "Patient follow-up", 
        xlab = "Patient ID", ylab = "Days",
        linetype.values = c(1, 2), shape.values = c(19, 15))
{
    x$event <- with(dat, ifelse(eventA, "A", "B"))
		x$id.ordered <- factor(x = dat$ID, levels = order(dat$t2, decreasing = T))

    out <- ggplot(x, aes(x = id.ordered)) + 
            geom_linerange(aes(ymin = 0, ymax = t1)) + 
            geom_linerange(aes(ymin = t1, ymax = t2, linetype = as.factor(censored))) +  
            geom_point(aes(y = ifelse(censored, t1 + (t2 - t1) / 2, t2), shape = event), 
                    size = 4) +
            coord_flip() + 
            scale_linetype_manual(name = "Censoring", values = linetype.values, 
                    labels = c("Not censored", "Interval censored")) +
            scale_shape_manual(name = "Event", values = shape.values) +
            opts(title = title) + xlab(xlab) +  ylab(ylab) + 
            theme_bw()

    return(out)
}

Then, add internal.censored to the class of your data object:

class(dat) <- c("interval.censored", class(dat))

Now, the plot can be produced simply by calling:

plot(dat)

You can either put plot.interval.censored() into a package, or add it to your .Rprofile, in which case it will always be available to you when you start R on your machine. Publishing a package might be preferred, as it is easier to share with others or install when you are not on your own machine. However, editing .Rprofile might be simpler. Hadley has a great overview of the S3 object system.

Related Question