Solved – Plotting interval censored follow-up time as a line chart

data visualizationrsurvival

I'm working on a survival analysis project where it would be useful to visualize everyone's followup time and event times. The data is made up of an ID, which of two possible events they had (A or B), and the time at which they had the event as well as whether or not they are interval censored. Non-censored individuals have two identical event times, t1 and t2, while censored individuals have t1 =/= t2, giving the lower and upper bound of their censored time.

This looks like so (all made up):

ID, eventA, eventB, t1, t2, censored
1, 0, 1, 7, 7, 0
2, 1, 0, 5, 5, 0 
3, 1, 0, 10, 10, 0
4, 0, 1, 4.5, 4.5, 0
5, 1, 0, 2, 8, 1

Where ID 5 is censored.

I'd like to produce a plot like this:

enter image description here

Essentially, make the X-axis ID, the Y axis time. Draw a line segment from 0 to t1. Then if they're censored, draw another line segment from t1 to t2. Presumably if they're uncensored they'll just overlap, which is fine. Then drop a marker at t2 for each type of event, say an X if they're eventA and O if they're eventB.

I presume there's a way to do this in R, but I haven't yet really wrapped my mind around graphing in R, and most of the examples for graphs in survival analysis I'm finding are KM curves. Anyone have an idea?

Best Answer

There must be many ways to make follow-up time plots with interval censored data, although a quick Google search only found this image in an overview of censoring, which looks a bit busy to my eye.

Just to give another perspective, here's an approach using the ggplot2 package.

require(ggplot2)

# Your example data
dat <- structure(list(ID = 1:5, eventA = c(0L, 1L, 1L, 0L, 1L), 
    eventB = c(1L, 0L, 0L, 1L, 0L), t1 = c(7, 5, 10, 4.5, 2), t2 = c(7, 5, 10, 4.5, 
    8), censored = c(0, 0, 0, 0, 1)), .Names = c("ID", "eventA", 
    "eventB", "t1", "t2", "censored"), class = "data.frame", row.names = c(NA, -5L))

# Create event variable
dat$event <- with(dat, ifelse(eventA, "A", "B"))

# Create id.ordered, which is a factor that is ordered by t2
# This will allow the plot to be ordered by increasing t2, if desired
dat$id.ordered <- factor(x = dat$ID, levels = order(dat$t2, decreasing = T))

# Use ggplot to plot data from dat object
ggplot(dat, aes(x = id.ordered)) + 
    # Plot solid line representing non-interval censored time from 0 to t1
    geom_linerange(aes(ymin = 0, ymax = t1)) + 
    # Plot line (dotted for censored time) representing time from t1 to t2
    geom_linerange(aes(ymin = t1, ymax = t2, linetype = as.factor(censored))) +  
    # Plot points representing event
    # The ifelse() function moves censored marker to middle of interval
    geom_point(aes(y = ifelse(censored, t1 + (t2 - t1) / 2, t2), shape = event), 
        size = 4) +
    # Flip coordinates
    coord_flip() + 
    # Add custom name to linetype scale, 
    # otherwise it will default to "as.factor(censored))"
    scale_linetype_manual(name = "Censoring", values = c(1, 2), 
        labels = c("Not censored", "Interval censored")) +
    # Add custom shape scale.  Change the values to get different shapes.
    scale_shape_manual(name = "Event", values = c(19, 15)) +
    # Add main title and axis labels
    opts(title = "Patient follow-up") + xlab("Patient ID") +  ylab("Days") + 
    # I think the bw theme looks better for this graph, 
    # but leave it out if you prefer the default theme
    theme_bw()

And the result:

Patient follow-up with interval censored data

When making graphs with line-type, color, size, etc. conditional on data, I find ggplot2 more intuitive than base graphics and even trellis, although trellis is much faster when plotting bigger data.

I'm not sure if it is preferred to place the event marker in the middle of the censored interval or at the end. I chose the middle here, to emphasize that the event did not necessarily occur near the end of follow-up.

Addendum

Once you decide on a standard plot, and if you find yourself making these plots often, it can be convenient to wrap it up in a function and use R's S3 object system to dispatch the plotting method with a call to the plot() generic. The following does not relate directly to the original question, but for the sake of other readers interested in adding methods to the plot() generic I'll include it here. First, wrap up the ggplot call into a plot method function:

plot.interval.censored <- function(x, title = "Patient follow-up", 
        xlab = "Patient ID", ylab = "Days",
        linetype.values = c(1, 2), shape.values = c(19, 15))
{
    x$event <- with(dat, ifelse(eventA, "A", "B"))
		x$id.ordered <- factor(x = dat$ID, levels = order(dat$t2, decreasing = T))

    out <- ggplot(x, aes(x = id.ordered)) + 
            geom_linerange(aes(ymin = 0, ymax = t1)) + 
            geom_linerange(aes(ymin = t1, ymax = t2, linetype = as.factor(censored))) +  
            geom_point(aes(y = ifelse(censored, t1 + (t2 - t1) / 2, t2), shape = event), 
                    size = 4) +
            coord_flip() + 
            scale_linetype_manual(name = "Censoring", values = linetype.values, 
                    labels = c("Not censored", "Interval censored")) +
            scale_shape_manual(name = "Event", values = shape.values) +
            opts(title = title) + xlab(xlab) +  ylab(ylab) + 
            theme_bw()

    return(out)
}

Then, add internal.censored to the class of your data object:

class(dat) <- c("interval.censored", class(dat))

Now, the plot can be produced simply by calling:

plot(dat)

You can either put plot.interval.censored() into a package, or add it to your .Rprofile, in which case it will always be available to you when you start R on your machine. Publishing a package might be preferred, as it is easier to share with others or install when you are not on your own machine. However, editing .Rprofile might be simpler. Hadley has a great overview of the S3 object system.

Related Solutions

Solved – Right-censored survival fit with JAGS

I was asked to re-post this answer here from my comment at http://doingbayesiandataanalysis.blogspot.com/2012/01/complete-example-of-right-censoring-in.html The specifics of this answer relate to the model in that comment, but the concepts apply to the topic here.

The core of the JAGS model for censored data is this:

isCensored[i] ~ dinterval( y[i] , censorLimitVec[i] )
y[i] ~ dnorm( mu , tau )

The key to understanding what JAGS is doing is that JAGS automatically imputes a random value for any variable that is not specified as a constant in the data. Thus, when y[i] is NA (i.e., a missing value, not a constant), then JAGS imputes a random value for it.

But what value should it generate?

The second line of the model, above, says that y[i] should be randomly generated from a normal distribution with mean mu and precision tau.

But the first line of the model, above, puts another constraint on the randomly generated value of y[i]. That line says that whatever value of y[i] is randomly generated, it must fall on the side of censorLimitVec[i] dictated by the value of isCensored[i].

To understand this part, let's unpack the dinterval() distribution. Suppose that censorLimitVec has 3 values in it, not just 1:

censorLimitVec = c(10,20,30)

Then randomly generated values from dinterval(y,c(10,20,30)) will be either 0, 1, 2, or 3 depending on whether $y<10$, $10 < y < 20$, $20<y<30$, or $30<y$. So, if $y=15$, dinterval(y,c(10,20,30)) has output of $1$ with 100% probability. The trick is this: We instead specify the output of dinterval, and impute a random value of y that could produce it. Thus, if we say

1 ~ dinterval(y,c(10,20,30))

then y is imputed as a random value between 10 and 20.

Putting the two model statements together,

1 ~ dinterval( y , censorLimit )

y ~ dnorm( mu , tau )

means that y comes from a normal density and y must fall above the censorLimit.

Hope that helps!!

Solved – Survival Analysis—Equal follow up time

Yes, these are survival analysis/event history analysis data.

The beginning of time in survival analysis is rarely calendar time, but is the first day the individual was observed in the study. This affects your interpretation in that intervention/treatment effects are understood to affect person time (e.g. to affect the hazard function in a given abstracted notion of "days since start of observation", or "days since diagnosis" or "days since treatment"... depending on the nature of your study design), rather than affecting the hazard function in terms of calendar time (i.e. you are not trying to estimate change in hazard due to treatment on June, 3rd, 2014).

If you only followed people for 6 months: that's 180 days; unless everyone experienced readmission by 180 days, there should be some right censoring, and the survival curve should not plummet to 0 at 180 days.

Best Answer

Related Solutions

Solved – Right-censored survival fit with JAGS

Solved – Survival Analysis—Equal follow up time

Related Question