Regression – Start Time Requirements or Assumptions for Survival Analysis

regressionsurvival

We have prospective data from an observational registry and wish to consider the affects of a gene on time to cardiovascular events. The data includes standard data like age, gender, … and also the date(age) of a cardio event.

In general terms what are the conditions/assumptions/requirements of a starting time for survival analysis? Entry into the registry is at random times for each individual (as he or she is recruited into the registry).

This is obviously no data at which the registry participants' genes changed, and there no intervention. Is it valid to use the date of enrollment in the registry as the start time for the survival analysis?

We (group working on this) are asking as some work has been done and the the enrollment date was used as the start time for the survival interval and there is some disagreement now as to whether this is a valid approach.

Best Answer

The starting time of the study is immaterial: it's just an origin for the clock. What you want to consider are the states in which the subjects can be found and the ages at which they transition from one to another. In this situation a minimum set of states would be

  • [Born]: "Born with gene." This always happens at age 0, of course.
  • [Enrolled]: "Enrolled in study."
  • [Endpoint]: "Cardiovascular event identified."
  • [Death]: "Death."

(This framework will allow multiple "endpoint" states to be modeled.)

The multistate analysis supposes there is a transition probability from some of these states to others. The relevant ones would be

  • [Born] --> [Death]. These account for people who never enrolled.
  • [Born] --> [Endpoint]. Are you considering these people? Are they even allowed into the study?
  • [Born] --> [Enrolled]. These are all the people selected for the study (who haven't died and don't already exhibit the cardiovascular disease).
  • [Enrolled] --> [Endpoint]. These are people in the study diagnosed with a cardiovascular disease.
  • [Enrolled] --> [Death]. These people died in the study without a diagnosis of cardiovascular disease.

The Nelson-Aalen estimator can be generalized to estimate the rates of these transitions. It's a simple estimator, summing the ratios of events occurring to the numbers of people at risk for them to occur. The conclusion of the recent TAS article Two Pitfalls in Survival Analyses of Time-Dependent Exposure is that if you get your multistate model wrong, you will miscount the number of people at risk in various states and that will bias the results. Its message is clear: get the multistate model right. If the study truly is prospective--that is, if you identify people with the gene at birth and follow them--then there is no question about the right model. Similarly, if enrollment in the study is independent of the presence of the gene, there will be no bias. Otherwise, this framework calls out for incorporating the study selection probabilities into the model and shows how to account for deaths and prior disease before enrollment was possible.

This paper also illustrates a nice tool for analyzing these subtleties: the Lexis Diagram. (Look at the figures in the end of this rather technical paper.) I believe these diagrams can be produced with the epi package in R. You might find them helpful for having discussions with your colleagues about the appropriate model to adopt.

ASA members and people with university library privileges probably already have online access to this article: it's worth reading.

Related Question