Solved – Intuitive explanation of censored data in a Cox model

cox-modelregression

I use Cox regression (proportional hazards) to model survival for a cohort of patients. Patients are censored (alive (0), dead (1)).

I was wondering how Cox regression uses censored data intuitively. I thought when alive (0), Cox model will just ignore them, but apparently it is not so simple.

For example, I used the Cox model with all patients (both alive and dead), and the Cox model with only the patients annotated as dead. Apparently, the results were quite different (likelihood ratio test for the whole model, Wald test for the individual covariates).

In brief, intuitively, how does the Cox model uses the censored data and how the censoring affects the results?

Best Answer

Inference in survival analysis is based on the survival likelihood.

The survival likelihood differs from the classical likelihood due to the presence of censored data:

  • the relevant information contained in an uncensored data ($\delta = 1$) is that the event occurred at the observed time ($y$); such data contributes to the likelihood via the density function, $f(y)$ (just like in the classical likelihood);
  • the relevant information contained in a censored data ($\delta = 0$) is that the event time exceeds the censoring time $y$; such data thus contributes to the likelihood via the survival function, $S(y)$.

Thus, the survival likelihood for a sample of size N is $$ \prod_{i=1}^N f(y_i)^{\delta_i}\, S(y_i)^{1 - \delta_i} $$


Note 1: The above likelihood is derived under working assumptions (e.g. independent and non-informative censoring).

Note 2: The Cox method (semi-parametric) uses a partial likelihood rather than the above likelihood. However, the idea was more easily explained like that.