**Background**

I'm designing a study that models time-to-event for two groups of study subject: people who receive a treatment and those who do not. I'm fairly new to applied survival analysis, so I've chosen Cox regression for this analysis: first, it has a nice carryover of intuition from other regression models I'm more familiar with, and second because I'd like to compare these two groups in light of other covariates. I'm conducting my analyses in `R`

, using the `survival`

package by Terry Therneau.

I've set up my data in so-called "counting process" format (or so I've seen it called in my survival analysis texts). Here's a dummy version of the dataset I just worked up in Excel:

As you can see, this represents one subject's (`ID`

number 1) records. You've got `time1`

and `time2`

, a `treatment`

indicator, an `event`

indicator, and a couple of other covariates.

**The Problem / Question**

Many of the subjects in my dataset have lots of data (rows) from the time *before* they were first treated, and you can see that that's the case for Mr. 1 in the table above: before he receives the treatment in row 3, we have 2 rows' worth of data. So: **Would the inclusion of this "pre-treatment" data somehow bias my outcome estimates (hazard ratios) for treatment? In other words, should I remove it?**

Since the survival time/interval I'm interested in in this study is a subject's time after treatment until the event or study end (censoring), my intuition tells me that I ought to exclude the pre-treatment data from my analytic dataset. In other words, I don't really care what's happened to someone before they receive the treatment. But I don't have much more than my intuition to go on here, and I've been going around in circles on this the last couple of hours.

Do I keep pre-treatment data in, or cut it out?

## Best Answer

In a survival analysis, the time at-risk is determined by the hypothesis under investigation. You can ask yourself “when does my experiment start”? For a trial, time 0 is the start of the trial; all prior survival is ignored. For an observational study, time 0 may be clearly defined (ie, after a surgery) or can be less clear (ie, appearance in a clinic). Simply having lived prior to the study is not a sufficient reason to include that time as part of the study.