Solved – Survival Analysis using interval censored data: Please help

interval-censoringrsurvival

I have a database of 22,720 nurses with four observation points say Jan 2011, Jan 2012, Jan 2013 and Jan 2014. I know at each observation point if they developed a condition or not. Some new nurses entered the study at each observation point; I rescaled the temporal scale to assume that all nurses start at time 0
If they are disease free in 2011 but are diseased in 2013 I assume they developed the disease in a time period of 2 years. Most of them never develop the disease.So, my final data looks like this:

       observed: 1yr      observed: 2yr observed 3yr: 
    disease 0:  177          937         19933
    disease 1:  642          482          549


        #So, simulating some data:
        n.0.disease.1year<-data.frame(event=rep(0,times=177), right=rep(1,times=177))
        n.0.disease.2year<-data.frame(event=rep(0,times=937), right=rep(2,times=937))
        n.0.disease.3year<-data.frame(event=rep(0,times=19933), right=rep(3,times=19933))
        n.1.disease.1year<-data.frame(event=rep(1,times=642), right=rep(1,times=642))
        n.1.disease.2year<-data.frame(event=rep(1,times=482), right=rep(2,times=482))
        n.1.disease.3year<-data.frame(event=rep(1,times=549), right=rep(3,times=549))

        my_data<-rbind(n.0.disease.1year, n.0.disease.2year, n.0.disease.3year, n.1.disease.1year, n.1.disease.2year, n.1.disease.3year)

#Now, I understand that this is interval censored data, so I need to use the survival package in R as #follows:

require('survival')
    my_data$left<-0
    my.surv<-Surv(my_data$left,my_data$right, type='interval2')



    sf.my.surv<-survfit(my.surv ~ 1, data = my_data)
    summary(sf.my.surv)
    # Or alternatively I can use
    my_data$event[my_data$event==1]<-3
    my.surv.2<-Surv(my_data$left,my_data$right, my_data$event,type='interval')
    sf2.my.surv <- survfit(my.surv.2 ~ 1, data = my_data)
    summary(sf2.my.surv)

    #With  the first option I get the result
    time n.risk n.event survival std.err lower 95% CI upper 95% CI
      0.5  22720   22720        0     NaN           NA           NA
    #With  the second option I get the result
    time n.risk n.event survival std.err lower 95% CI upper 95% CI
      0.5  22720   22720        0     NaN           NA           NA

Could someone please tell me what I am doing wrong? Thanks in advance

Best Answer

The issue is that you've misspecified your response interval.

For models with interval censored response, each subject as a response variable given by $(L_i, R_i]$, where $L_i$ is the last time the subject $i$ was known not have experienced an event and $R_i$ is the first time subject $i$ was known to have an event. So in your study, if a nurse tested negative on year 2 and positive on year 3, they should have response interval $(2, 3]$.

But as you've coded your data, left is a constant at 0. Since every single interval contains the single time point 0, the NPMLE (i.e. fit provided by survfit) places all the probability mass on time 0.

If I understand your problem correctly, for all the subjects that were observed for 1 year only and never tested positive, you should code as $(1, \infty)$, where as subjects who tested positive in the first year I think should be $(0, 1]$ (I say "I think" because this is conditional on the subjects only being exposed for 1 year, which may not be the case in this study!). Next, subjects who were followed for 2 years and never tested positive should be coded as $(2, \infty)$, while subjects that were followed for two years, tested negative after one year but positive after two would be coded as $(1, 2]$.

Related Solutions

Solved – Survival analysis in SAS-is there a way to include a random effect with interval censored data

In SAS, you can use "proc phreg" and there is a "random" statement where you can assign your random effect.

for example: if variable (dish) is your cluster then

proc phreg data=survGeno2;
   class dish geno;
   model Time*Status(0)=geno;
   random dish; <- to assign the cluster effect here
   hazardratio 'Frailty Model Analysis' geno;
   run;

reference from: http://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_phreg_sect056.htm

Here is a good teaching video here, please refer to https://www.youtube.com/watch?v=ZfgRBuM4u3U

There are other way to assign the distribution of frailty other than normal distribution. You can also use nlmix or others.

R – Right Censored Survival Analysis with Interval Data in R

As implied by the tag, this is referred to as interval censoring. In this scenario, for each observation, we have a pair of values that I call observation intervals: $(L_i, R_i]$, where $L_i$ represents a lower bound on the true value of interest and $R_i$ presents an upper bound on the true value of interest. For example, suppose you go to a dentist at age 9 and have no cavities. You go again at age 12 and have one cavity. Then all you know about your "age at first cavity" is that it is in the interval $(9,12]$.

Note: you will need to slightly change your data to put it in this format. For example, you have the interval $(40, 60]$ for subject C but you also state that they are right censored. I assume this means that we know that for subject C, at time 60 the event of interest had not occurred yet. In that case, you should represent their observation interval as $(60, \infty)$.

There exists a variety of tools for analyzing interval censoring data. One of the most basic tools is the non-parametric maximum likelihood estimator (NPMLE). This is basically an extension of the Kaplan Meier curves that allows for interval censoring. For performing hypothesis testing to compare two groups, the log-rank statistics have been generalized to allow for interval censoring. Finally, survival regression models (proportional hazards, accelerated failure time and proportional odds, to name few) can be used.

In R, the NPMLE and regression models can be found in my icenReg package. The log-rank statistic can be found in the interval package.

Best Answer

Related Solutions

Solved – Survival analysis in SAS-is there a way to include a random effect with interval censored data

R – Right Censored Survival Analysis with Interval Data in R

Related Question