The flaw in your argument is that $\hat\Lambda(t)-\Lambda(t)$ is not a martingale for all times $t$. It is only a martingale up to the time $T$ when the experiment ends, that is, when the last survivor either dies or becomes censored (i.e. drops out of the study). After that, $\Lambda(t)$ continues to increase but $\hat\Lambda(t)$ does not.
Now, if the last survivor in the sample actually dies, then at this point it doesn't matter that $\hat\Lambda(t)-\Lambda(t)$ stops being a martingale, because $\hat S(t)=0$ and therefore $\hat S(t)/S(t)$ will continue to be a martingale. So without censored data, the K-M estimator is in fact unbiased (it is pretty easy to show this directly without stochastic processes).
However, if censoring exists, this raises the possibility that the last remaining survivor will become censored rather than die. In this case $\hat S(t)$ will never drop to zero, and so $\hat S(t)/S(t)$ will cease to be a martingale at time $T$. This possibility -- that the last survivor becomes censored -- is the source of bias in the K-M estimator.
Weights in a survival model give you flexibility in terms of data formatting or a way to try to adjust estimates for sampling that wasn't representative. Therneau and Grambsch say, in Section 7.3 of Modeling Survival Data--Extending the Cox Model (Springer, 2000):
Two distinct uses for case weights (among many uses) need to be distinguished. The first is jrequency weights; a weight of 3 means that 3 data points were actually observed, had the same values for all variables, and have been collapsed into a single observation to save space. The program should then treat an observation with a weight of k as if it had appeared k times in the input data set. The second is sampling weights. For instance, if 10% of the high-risk subjects for a condition were included in a study but only 1% of those with low or moderate risk, we would want to weight the observations inversely as the sampling fractions to reflect this design, giving case weights 10 times greater to the low/moderate-risk individuals than to the high-risk ones.
For a Kaplan-Meier estimate, you could just weight both the deaths and the numbers at risk at each event time $i$ by the individual case weights. Think, for example, about the first example in the quote above: for a case weight of 2, you just double-count the weighted case in the denominator so long as it is at risk, and give it a count of 2 in the numerator at its event time. I'm not sure that's how it's implemented in the survival
package; you could check by examining the C source code for Csurvfitkm
, which does the main calculations. Perhaps Thomas Lumley, who used to maintain the package, could discuss further.
For the Cox partial likelihood solution, it's essentially what you propose. For the unweighted situation, differentiating the log partial likelihood with respect to the parameter-value vector $\theta$ gives a score vector (Equation 3.4 of Therneau and Grambsch):
$$ U(\theta) = \sum_{i=1}^n \int_0^{\infty} \left[X_i(s) - \bar x(\theta,s)\right] dN_i(s) = \sum_{i=1}^n U_i(\theta)$$
where $X_i$ represents the covariate values for case $i$ and $\bar x$ is a risk-weighted mean of $X$ over observations at risk.* The maximum partial likelihood estimator $\hat \theta$ solves:
$$\sum_{i=1}^n U_i(\hat \theta) = 0. $$
With case weights $w_i$, you instead solve:
$$\sum_{i=1}^n w_i U_i(\hat \theta) = 0. $$
while also case-weighting the contributions to $\bar x(\theta,s)$* in the score vector. Handling of variances and the information matrix is similar. See Section 7.3 of Therneau and Grambsch.
Note that the coxph
default Efron approximation for tied event times is implemented via temporary case weights even in an unweighted Cox regression; see Section 5.1 of the main R survival vignette.
Case weights do affect some other calculations in coxph()
. For example, non-integer case weights (as you might have in inverse propensity score weighting) lead to calculation of a robust variance estimate; see Section 2.7 of that vignette.
*The risk score for case $i$ in a regression without case weights is $r_i(\theta,s) =\exp[\theta' X_i(s)]$. Then the risk-weighted covariate average is:
$$\bar x(\theta,s) = \frac{\sum Y_i(s) r_i(s)X_i(s)}{\sum Y_i(s) r_i(s)},
$$
where $Y_i(s)$ is the at-risk indicator for time $s$. In the case-weighted regression, $r_i$ becomes $w_i r_i$.
Best Answer
The Kaplan-Meier Curve does not disappear when there is complete data. The true survival function is S(t)=1-F(t). The Kaplan-Meier is also called the product limit estimator. If you look it up in wikipedia you will find the case of complete data and the form of the KM curve as a product.