I posed a question about how to set up the code for this question here (Psychometrics: Survival analysis of help seeking behaviors) and @Fomite suggested that I pose a separate question about whether my data are set up correctly.
Background:
I'm studying people seeking help. Participants described contacts with between 1 and 3 "responders" (e.g., friends, the police) in order- for example, a participant could have contacted just responder 1, or responder 1, then responder 2, then responder 3. I'm trying to predict help-seeking dropout, meaning that, for example, a participant contacted responder 1 but did not go on to contact a second or third responder- that participant would have a dropout at responder 1. So unlike other survival analysis, the observations are responders rather than time points- but they're still ordered in time. The independent variables in my model include characteristics of the people seeking help (e.g., gender) and aspects of their interactions with the responders (e.g., whether they liked the interaction). The data are right-censored for those participants who said that they contacted more than three responders because they could not record more than three responders in the survey. There are two people who only reported on responder 3; those people are left-censored because data are missing for responders 1 and 2.
Current data setup:
The data are set up as a person-period dataset such that there is a line for each responder, which means that most participants have multiple lines. Responders are nested within participants. So a participant that contacted two responders would have two lines in the dataset; the participant-level data is the same in both lines and the responder-level data is different.
Here's what the data looks like now: https://flic.kr/p/s1fW6k
Variables:
Participant-level:
Responder-level:
My question:
@Fomite pointed out that, the way I've set up my data, each responder with a 0 value for stoppedhelpseeking will have a censored event time. This doesn't sound like what I want- what I'm hoping to predict is the point at which people drop out of help-seeking. Censoring should occur at the participant level, not the responder level. What's the right way to set this up to model things correctly?
Best Answer
Actually, the censoring of everything with
stoppedhelpseeking=0
is what you want.Having a subject with a time-dependent covariate is the same as having that subject leave the study when the covariate changes, and another subject with slightly different covariates enter the study at the same time. So you do want it to look like everything with
stoppedhelpseeking=0
got censored. And then you want to claim that this person only entered that study when their previous version exited the study--so the person with responder 1 was in the study for the "time" interval(0, 1]
, the person with responder 2 was there for time interval(1, 2]
, etc. In this case yourcensor
variable is actually irrelevant.I don't know SAS, but in R, the way you would set up the response variable here is
Surv(responder-1, responder, stoppedhelpseeking)
. For more info, see section 2 of this vignette. I imagine it's similar in other statistical software.EDIT: based on @Fomite's answer to your other question, you can probably use something like
entry=responder-1
to encode this?