How to Define Survival Time for Prediction Model – Using Cox Regression in Datasets

cox-modelpredictive-modelssurvival

I am trying to develop a clinical risk score using a prediction model to predict the risk of developing a severe outcome (a type of tumour) that usually occurs in older adults from less severe clinical manifestations of the disease usually occurring in children (all chosen based on clinical expert knowledge on the disease).

The data I'm using comes from an ongoing clinical database using both prospective and retrospective data collection (If patient x joins at 50 years old but was diagnosed with clinical manifestation a at 25 years old, a will be entered in the data with the date at which it was diagnosed). All individuals in it have the same genetic disease. Participants joined the study at varying ages, some as babies and some over 60 years old.

The variables my data contains include the date of birth for every participant, their age when they joined the study and the date of their last follow-up. As for the clinical manifestations of the disease, they are all binary variables (Present/absent), while the one used as my outcome also comes with a second variable stating the age at diagnosis.

I'm having trouble defining the end of follow-up in my data. Having the birthdate of participants and knowing that outcome never happens before the age of 10, I'm thinking of choosing either birth or age of 10 as a starting point.

I'm struggling with the end of follow-up since it's an ongoing database, meaning that there is no date at which the study ends.

One possibility is just to use either date at the last visit for the censor and the date of outcome diagnosis for the event. Knowing that the oldest person diagnosed with the outcome is 69 years old, but that several people not developing the outcome are followed well until their eighties, does this pose a problem? Is there supposed to be a specific cutoff usually for the end of follow-up, or is it just what we're used to seeing for studies with a specific end time?

I'm confused since, in this case, we're not interested in something like remission after 5 years; we are just interested in knowing how certain predictors' presence affects the risk of developing the outcome.

As for my question, is this a situation in which Cox, or maybe even survival analysis, is maybe not the best answer?

If Cox regression is a good choice, how is follow-up usually defined in a situation like this?

Best Answer

Is there supposed to be a specific cutoff usually for the end of follow-up, or is it just what we're used to seeing for studies with a specific end time?

There is no need to have "a specific end time." An advantage of survival analysis is that you can use all the information that you have with respect to outcome times, even those with very long observation times without events. If you are using age as your time scale and the last event happened at age 69, then participants observed past age 69 will provide information up to age 69 in the Cox model. There's no reason (or advantage) to censoring those values at ages lower than what you have observed. As more data become available and events are found at later ages, data from those individuals will help refine the model.

As discussed in a previous question you do need to treat the study-entry age as left truncated for those who enter without having experienced the event, as no such individual provides information about potential events that might have occurred prior to the age at study entry.

Some cases

If patient x joins at 50 years old but was diagnosed with clinical manifestation a at 25 years old, a will be entered in the data with the date at which it was diagnosed

might need special treatment. For them, you might only have an upper limit to the age at which the event of interest occurred; it might have been earlier than 25 years old in that example. Those event ages should be treated as left-censored unless the specific age at the event is known. The Klein and Moeschberger text provides many examples of how to distinguish the different types of censoring and truncation.