Solved – Setting up dataset for Cox regression with time-dependent variables

cox-modeldatasetregressionsurvival

I posed a question about how to set up the code for this question here (Psychometrics: Survival analysis of help seeking behaviors) and @Fomite suggested that I pose a separate question about whether my data are set up correctly.

Background:
I'm studying people seeking help. Participants described contacts with between 1 and 3 "responders" (e.g., friends, the police) in order- for example, a participant could have contacted just responder 1, or responder 1, then responder 2, then responder 3. I'm trying to predict help-seeking dropout, meaning that, for example, a participant contacted responder 1 but did not go on to contact a second or third responder- that participant would have a dropout at responder 1. So unlike other survival analysis, the observations are responders rather than time points- but they're still ordered in time. The independent variables in my model include characteristics of the people seeking help (e.g., gender) and aspects of their interactions with the responders (e.g., whether they liked the interaction). The data are right-censored for those participants who said that they contacted more than three responders because they could not record more than three responders in the survey. There are two people who only reported on responder 3; those people are left-censored because data are missing for responders 1 and 2.

Current data setup:
The data are set up as a person-period dataset such that there is a line for each responder, which means that most participants have multiple lines. Responders are nested within participants. So a participant that contacted two responders would have two lines in the dataset; the participant-level data is the same in both lines and the responder-level data is different.

Here's what the data looks like now: https://flic.kr/p/s1fW6k

Variables:

Participant-level:

  • id is the ID number for the participant.
  • gender is the participant's gender; 1 = woman and 2 = man

    Responder-level:

  • responder represents the responder number in the order that the participant contacted them. Possible values are 1, 2, and 3.
  • stoppedhelpseeking represents whether the participant stopped seeking help/dropped out after contacting that responder; 0 = no and 1 = yes. This is the event that I'm hoping to predict.
  • likedresponder represents whether the participant liked their interaction with the responder; 0 = no and 1 = yes
  • censor represents whether the participant did not report a dropout by responder 3. It is coded for their last recorded responder in the dataset.

    My question:
    @Fomite pointed out that, the way I've set up my data, each responder with a 0 value for stoppedhelpseeking will have a censored event time. This doesn't sound like what I want- what I'm hoping to predict is the point at which people drop out of help-seeking. Censoring should occur at the participant level, not the responder level. What's the right way to set this up to model things correctly?

  • Best Answer

    Actually, the censoring of everything with stoppedhelpseeking=0 is what you want.

    Having a subject with a time-dependent covariate is the same as having that subject leave the study when the covariate changes, and another subject with slightly different covariates enter the study at the same time. So you do want it to look like everything with stoppedhelpseeking=0 got censored. And then you want to claim that this person only entered that study when their previous version exited the study--so the person with responder 1 was in the study for the "time" interval (0, 1], the person with responder 2 was there for time interval (1, 2], etc. In this case your censor variable is actually irrelevant.

    I don't know SAS, but in R, the way you would set up the response variable here is Surv(responder-1, responder, stoppedhelpseeking). For more info, see section 2 of this vignette. I imagine it's similar in other statistical software.

    EDIT: based on @Fomite's answer to your other question, you can probably use something like entry=responder-1 to encode this?