Solved – Why use ln-ln plot in proportional hazard test

cox-modelkaplan-meierregressionsurvival

I recently began studying survival analysis and there is something I am curious about.

Why do we prefer to use the ln-ln survival curve rather than the survival curve in a proportional hazard test? As you know, the $\hat{S}$ we use in ln-ln formula is the survival function based on Kaplan-Meier curves, so why we don't use the untransformed KM survival function?

Best Answer

Let us define the linear predictor (i.e. log hazard ratio) $\eta = X^T \beta$. Then the proportional hazards model can be written as

$h(t | \eta) = h_o(t) \exp(\eta)$

This relation is equivalent to

$S(t | \eta) = S_o(t)^{\exp(\eta)}$

Therefore, if we plot the cloglog (complimentary log log, or ln -ln as stated in the question) of the survival functions, we get

$\text{cloglog} (S(t | \eta) ) = \text{cloglog} (S_o(t)) - \eta$

i.e. if the proportional hazards assumption is true, the curves should differ only by a constant. It's much easier to visually assess whether two curves differ by an additive constant than whether one differs by an exponential factor.

As an example, here is simulated data that does not follow the proportional hazards model (it follows a proportional odds model instead). Looking at the cloglog plots (with the average of the two cloglog functions removed for easier comparison), we can see that the difference between these two functions is not exactly constant.

cloglogplot

However, looking at the two survival curves, you would have to have a much better eye than I will ever have to determine that these two curves do not differ by an exponential factor.

enter image description here