Cox Model – How to Model Treatment Variable When Timing is Unknown

cox-modellogisticrmstime-varying-covariate

We have cancer medical registry data, including information on date of diagnosis, treatment, and followup e.g. date of death etc. However, we only know type of treatment received for each person. We have no information on WHEN they received treatment, nor dose, frequency etc. Whilst this is not optimal, I guess this is better than not knowing at all. However we are uncertain how to model such treatment variables in e.g. a Cox model. It seems to me that a fixed in time covariate is not appropriate (e.g. may begin treatment well after time zero). Time varying covariate we cannot properly do either as this is precisely the information we lack. Running as a logistic regression model is one thing we thought of but that seems wasteful since we do know time to outcome. Any advice would be great!

Update
Only information on primary cancer and primary treatment is known. We don't have any information on subsequent treatment if recurrence occured.
We have date for initial diagnosis of cancer (in situ) and also dates for diagnosis of invasive cancer and death (if either occured).

Suppose we want to use a Cox model for time FROM in situ diagnosis to event (e.g. invasive cancer or death from cancer), we are unsure on where to start time zero in our situation. If we set time zero equal to in situ diagnosis date (unique to each person), then clearly nobody at that time has actualy had any treatment yet!
Or we could make some assumptions about the likely timeframes which treatment USUSALLY occurs fololowing diagnosis and set time zero to after that? I guess I was thinking that this might lose power and also different treatment is likely to start at different times…e.g. maybe chemo, then radio etc. making it very confusing. I'm thinking along the lines of the comment from @EdM on this question:
Cox model: advice on constructing time varying exposure to drugs
where the comment says…
Any fixed-in-time covariate should have a value that was in place at time=0 in the survival analysis.

Despite this statistical requirement, might it be "valid" to still set time zero equal to in situ diagnosis date and think of it as patient will at some time over follow up experience treatment x, y z etc. even if they have not yet experienced any treament at all at time zero? And then model these treatemnt variables as just ordinary main effects (i.e. fixed in time rather than time varying) in the Cox model?

Best Answer

Usual practice is to define time = 0 in studies of primary cancer with this type of data as the date of diagnosis. That's typically also the date of most interest to the patient. For example, in the Clinical Data Resource for The Cancer Genome Atlas, the Methods state:

OS [overall survival] is the period from the date of diagnosis until the date of death from any cause.

Although the therapy is not yet defined at that date, there is no serious problem in using the ultimate choice of primary therapy as a fixed-in-time predictor in a Cox survival model. Therapy typically begins within a few weeks of diagnosis, a period during which there are usually few deaths. In particular for a Cox model, absolute times don't matter, only their ordering in time. Unless there are many early deaths, the results will be interpretable as the post-diagnosis survival of patients who received (or at least were assigned to) each type of therapy. That interpretation then includes the delays typically involved in choosing and providing each type of therapy.

In some circumstances and depending on how data are coded, you might want to omit very early deaths when the definition of therapy received might be ambiguous. For example, a patient might have received primary surgery with a recommendation for adjuvant radiotherapy, but have died from surgery complications before the course of radiation was finished. Depending on the specific question you are asking and your knowledge of the subject matter, you might want to omit such individuals with unexpectedly early deaths from the study, while clearly explaining that choice in your report.

What's complicated in comparing responses to therapy this way is that the choice of therapy typically is a function of clinical characteristics, like tumor size and spread to lymph nodes, that themselves are associated with outcome. Those problems pose much more difficulty in interpretation than the choice of time = 0.

Related Question