Survival – Correct Treatment of Censorship and Observation Periods in Longitudinal Survival Analysis

I have longitudinal retrospective data and want to perform Cox proportional hazards regression. Still, I would be extremely grateful if the CrossValidated community could sanity-check my understanding of how I'm treating censorship, death, lost-to-follow-up, and those leaving the study during the observation period.

My definitions:

Index – the date the patient enrols in the study (these will be different).
Observation period (aka the study period, study observation, etc.) – 5 years (1825 days)
Event – I am interested in some event, e.g., remission. status=1 if the event happens during the five-year observation from index.
end-research-date – a hard threshold put in place for whatever reason (e.g., to avoid reading in values during a pandemic) to prevent any records from being read beyond this point.
status – the variable encoded whether an event happened (1 it happened, 0 it did not happen Please see the updated text at the bottom of this question).
Time-to-event (aka survival time) – the time in days from the index-date to the date of the event. This is stored as duration.

I would be grateful for a sanity check on whether I'm encoding the data correctly for the following scenarios.

Scenario 1 – Lost-to-follow-up

If a patient's survival time reaches the observation duration (they were observed for the complete 1825-days) and the event has not happened, I'm assuming that the patient is lost-to-follow-up and they are right-censored.

On that note, suppose a patient reaches the end of the observation period, and the event has not been observed. Is it correct to say the patient has been lost-to-follow-up and that the recorded survival time should be the observation duration (1825-days) and not the number of observational days + the number of days from the end of the observation period to event date, e.g., 0—>1825(obs-ends)—->2025(event).

Scenario 2 – death during observation

If a patient dies during their observation period but before an event is observed, should I:

remove the patient record altogether, or,
record the survival time up to death, and censor the patient i.e., set the status`` to 0: patient=Nth, status=0, duration=(deathDate – indexDate)“`

Scenario 3 – patient leaves during observation

Similar to a patient dying. The patient is enrolled (index date) and observed for five years. During the observation period, there is a mark on their record to say that the patient has left. Should this be treated like death, i.e., survival time is taken from the index date to the patient leaving date, and then the event status is set to 0 (censored)? This encoded as patient=Nth, status=0, duration=leavingDate - indexDate

Scenario 4 – Observation duration/period longer than patient history.

Should a patient be excluded from the study if the patient's longitudinal record is shorter than the observation period/duration?

For example, we have records going back to 2000 and up to 1st January 2022 (22 years of data). We want to use an observation period of five years to observe a potential event. A patient is enrolled but has an index date of 10 March 2021. They do not have enough time for this study. At the moment, I am excluding these patients when I define my cohort well before performing any statistical analysis. I see this as an exclusion criterion. Is this correct?

Conceptual Questions

If several patients die, leave the study, or are lost-to-followup and the cohort sample size is large enough, could these patient records be dropped from the Cox PH calculations? Or, put another way, must I always include censored data? My gut feeling is that all data, including censored data, is included.
Are my concepts of lost-to-followup, death (when death isn't the event), leaving the study correct? i.e., They all result in the patient being censored, and the record survival times are the complete observation time (they reached the end), duration from index to death, and duration from index to leaving, respectively.

In response to comments

Thank you for the comments. In response to the accepted answer, I'm making the following changes:

Scenario 1 – Lost-to-follow-up UPDATED

Each patient is followed from Index date to whichever comes first from an event, the end-of-research date, or the end of the observation period. On experiencing an event, status is set to a particular outcome value (see scenarios 2 and 3). If the subject was followed to either the observation date or the end-of-research date (whichever came first), without an event, the subject is lost-to-followup and status=0 for that subject, and the time-to-event (aka survival time) is the last observable time point on their record.

Scenario 2 and 3 – death during observation and patient leaves during observation UPDATED

For example, I used "remission" as my event of interest (status=1), but I'm also interested in whether the subject left the study or died during the observation period. Initially, upon detection, I was going to right censor (status=0) these individuals. However, death and leaving a study are what's known as competing risk events, i.e., any event that might interrupt the observation of an event of interest. Death event becomes status=2 and a leaving event becomes status=3, and the time from Index to either event is recorded as survival time. To calculate the HR for competing risks in a Cox model, it is simply a case of introducing strata per possible event. Therefore, in very rough pseudo code:outcome ~ cov1 + cov2 + ... + covN + strata(status)

Scenario 4 – Observation duration/period longer than patient history UPDATED

Given that I have a large sample size, I am performing a sensitivity analysis to see what difference removing those patients with insufficient observation periods makes to the final hazards ratio.

Best Answer

There's seldom anything to be gained by throwing away information. You do have to include it correctly in your analysis, however. Sometimes, as described below, censoring is not the correct choice.

I still refer frequently to the Leung et al review on censoring, even though it's a quarter-century old. Read it for more insight on what follows.

Scenario 1

Scenario 1 is what's described as "Type I censoring" by Leung et al:

a study in which every subject is under observation for a specified period $C_0$ or until failure.

So I wouldn't call those "lost-to-follow-up." It's just an accepted type of study design with censoring. Yes, count them as right-censored as of their last follow up within your time window.

Scenarios 2 and 3

There is a big risk here of informative censoring if your event is "remission"* while some individuals die (Scenario 2) or might leave the study due to illness (e.g., in Scenario 3). At the least, this would require a competing-risks analysis if each event is absorbing (no return to the study after the event). As the vignette mentioned by Frank Harrell in a comment says in Section 2.2 (page 8):

A common mistake with competing risks is to use the Kaplan-Meier separately on each event type while treating other event types as censored.

So for Scenario 2 you certainly can't remove those who died, and you can't censor on death dates. Death needs to be treated as a competing event.

You have to apply your understanding of the subject matter with respect to how to treat Scenario 3. Is censoring under that Scenario informative about future event times?

Scenario 4

If individuals with less than 5 years of follow up were still enrolled and available in principle for data collection at the time that you chose to end the study, they can still provide information up through their last observation date. As you describe it, that censoring wouldn't seem to be informative. Those individuals could be considered as having right-censored event times as of their last observation. That they didn't get all the way to 5 years doesn't matter; that's an advantage of survival analysis that appropriately incorporates information about censoring times.

*If the event is "remission" you really have a type of cure model, as that event (unlike death) presumably doesn't eventually happen to all.