Solved – What does ‘km’ transform in cox.zph function mean

cox-modeldata transformationhazardproportional-hazardsr

I'm trying to understand how cox.zph function in R programming language works and I find myself not knowing what km transform mean. I get the rank transform and obviously the identity one, but km is still not clear to me and codes for this function did not make it clearer.

Best Answer

km stands for Kaplan-Meier estimator.

$$\hat{S}(t) = \prod_{i: t_i \le t}\left(1-\frac{d_i}{n_i} \right)$$

with $t_{i}$ a time when at least one event happened, $d_i$ the number of events (i.e., deaths) that happened at time $t_{i}$ and ${\displaystyle n_{i}}$ the individuals known to have survived (have not yet had an event or been censored) up to time $t_{i}$.

Here is quote from the paper Cox Proportional-Hazards Regression for Survival Data:

Tests and graphical diagnostics for proportional hazards may be based on the scaled Schoenfeld residuals; these can be obtained directly as residuals(model, "scaledsch"), where model is a coxph model object. The matrix returned by residuals has one column for each covariate in the model. More conveniently, the cox.zph function calculates tests of the proportional-hazards assumption for each covariate, by correlating the corresponding set of scaled Schoenfeld residuals with a suitable transformation of time [the default is based on the Kaplan-Meier estimate of the survival function, $K(t)$].

To know why the choice of km as the default, Dr. Kevin E. Thorpe cited Dr. Therneau's reply in the R-news:

There are 2 reasons for making the KM the default:

  1. Safety: The test for PH is essentially a least-squares fit of line to a plot of f(time) vs residual. If the plot contains an extreme oulier in x, then the test is basically worthless. This sometimes happens with transform= identity or transform =log. It doesn't with transform='KM'.

    As a default value for naive users, I chose the safe course.

  2. A secondary reason is efficiency. In DY Lin, JASA 1991 Dan-Yu argues that this is a "good" test statistic under various assumptions about censoring. (His measure has the same score statistics as the KM option).

But #1 is the big one.

Terry T.

Related Question