Solved – Survival Analysis using interval censored data: Please help

interval-censoringrsurvival

I have a database of 22,720 nurses with four observation points say Jan 2011, Jan 2012, Jan 2013 and Jan 2014. I know at each observation point if they developed a condition or not. Some new nurses entered the study at each observation point; I rescaled the temporal scale to assume that all nurses start at time 0
If they are disease free in 2011 but are diseased in 2013 I assume they developed the disease in a time period of 2 years. Most of them never develop the disease.So, my final data looks like this:

       observed: 1yr      observed: 2yr observed 3yr: 
    disease 0:  177          937         19933
    disease 1:  642          482          549


        #So, simulating some data:
        n.0.disease.1year<-data.frame(event=rep(0,times=177), right=rep(1,times=177))
        n.0.disease.2year<-data.frame(event=rep(0,times=937), right=rep(2,times=937))
        n.0.disease.3year<-data.frame(event=rep(0,times=19933), right=rep(3,times=19933))
        n.1.disease.1year<-data.frame(event=rep(1,times=642), right=rep(1,times=642))
        n.1.disease.2year<-data.frame(event=rep(1,times=482), right=rep(2,times=482))
        n.1.disease.3year<-data.frame(event=rep(1,times=549), right=rep(3,times=549))

        my_data<-rbind(n.0.disease.1year, n.0.disease.2year, n.0.disease.3year, n.1.disease.1year, n.1.disease.2year, n.1.disease.3year)

#Now, I understand that this is interval censored data, so I need to use the survival package in R as #follows:

require('survival')
    my_data$left<-0
    my.surv<-Surv(my_data$left,my_data$right, type='interval2')



    sf.my.surv<-survfit(my.surv ~ 1, data = my_data)
    summary(sf.my.surv)
    # Or alternatively I can use
    my_data$event[my_data$event==1]<-3
    my.surv.2<-Surv(my_data$left,my_data$right, my_data$event,type='interval')
    sf2.my.surv <- survfit(my.surv.2 ~ 1, data = my_data)
    summary(sf2.my.surv)

    #With  the first option I get the result
    time n.risk n.event survival std.err lower 95% CI upper 95% CI
      0.5  22720   22720        0     NaN           NA           NA
    #With  the second option I get the result
    time n.risk n.event survival std.err lower 95% CI upper 95% CI
      0.5  22720   22720        0     NaN           NA           NA

Could someone please tell me what I am doing wrong? Thanks in advance

Best Answer

The issue is that you've misspecified your response interval.

For models with interval censored response, each subject as a response variable given by $(L_i, R_i]$, where $L_i$ is the last time the subject $i$ was known not have experienced an event and $R_i$ is the first time subject $i$ was known to have an event. So in your study, if a nurse tested negative on year 2 and positive on year 3, they should have response interval $(2, 3]$.

But as you've coded your data, left is a constant at 0. Since every single interval contains the single time point 0, the NPMLE (i.e. fit provided by survfit) places all the probability mass on time 0.

If I understand your problem correctly, for all the subjects that were observed for 1 year only and never tested positive, you should code as $(1, \infty)$, where as subjects who tested positive in the first year I think should be $(0, 1]$ (I say "I think" because this is conditional on the subjects only being exposed for 1 year, which may not be the case in this study!). Next, subjects who were followed for 2 years and never tested positive should be coded as $(2, \infty)$, while subjects that were followed for two years, tested negative after one year but positive after two would be coded as $(1, 2]$.