R – Risk Table for a Kaplan-Meier Plot

ggplot2kaplan-meierrrisksurvival

I need to make a Kaplan Meier plot with an at-risk or risk-set table beneath it. Otherwise stated, I need a table of the number of subjects at risk at different time points aligned below the figure. I found a website that explains how to do this for a plot that contains multiple subgroups. This is the ggkm function, the code for which is available here.

# Example of a plot like this
library(survival)
data(colon)
fit <- survfit(Surv(time,status)~rx, data=colon)
ggkm(fit, timeby=500, ystratalabs=c("Obs","Lev","Lev+5FU"))

Example of a plot like this

Looking at the data that was used to make the above plot, we see:

Call: survfit(formula = Surv(time, status) ~ rx, data = colon)

           records n.max n.start events median 0.95LCL 0.95UCL
rx=Obs         630   630     630    345   1723    1323    2213
rx=Lev         620   620     620    333   1709    1219    2593
rx=Lev+5FU     608   608     608    242     NA      NA      NA

My question:
My plot only has one group. The call looks like this:

Call: survfit(formula = Surv(Recur_day/365.242, Recur) ~ 1, data = study_data)

records   n.max n.start  events  median 0.95LCL 0.95UCL 
    440     440     440      92      NA      NA      NA 

My plot

When I try to use the ggkm function I get an error like so:

 ggkm(survfit(formula = Surv(Recur_day/365.242, Recur) ~ 1, data = study_data), timeby = 2)
Error in data.frame(time = sfit$time[subs2], n.risk = sfit$n.risk[subs2],  : 
  arguments imply differing number of rows: 1, 0
In addition: Warning message:
In max(nchar(ystratalabs)) :
  no non-missing arguments to max; returning -Inf

Does anyone have an idea of what I'm doing wrong? Or is there an easier way for me to do this by hand? I am open to just putting the data in a text box in PowerPoint myself, I'm just not certain of how to get the number of subjects at risk at each timepoint (I'd like to do at two years, four years, six years, etc.)

Best Answer

You would be well advised to check that code carefully. If you look at the number of cases with complete data the stating numbers in risk sets are significantly different:

sum(na.omit(colon)$rx=="Obs")
[1] 610   # was 630 in figure above, but could be missing other covatiates

This will generate a tabular calulation and then serially subtract the numbers of events that occurred in the prior interval:

risksets <- with(na.omit(colon[ , c("time", "status","rx")]),
                      table(rx, cut(time, seq(0, max(time), by=500) ) ))
sapply(1:3, function(i) 
              Reduce("-",  risksets[i,], init=rowSums(risksets)[i], accumulate=TRUE))
#-----------------
    [,1] [,2] [,3]
Obs  621  612  594
Obs  461  456  484
Obs  363  352  411
Obs  306  310  373
Obs  247  258  314
Obs   81   99  113
Obs    0    0    0

As is typical with R *apply functions, the result is transposed because it is returned as matrix columns. Use t() to fix it.

Related Question