I have some data which I've analyzed using Kaplan Meier estimation. However, I have a gut feeling that this estimator is biased due to the high censoring rate in my data (nearly 50% censored at later times). What are some ways to address this in an analysis?
Solved – High censoring rate in survival analysis
censoringkaplan-meier
Related Solutions
Yes, these are survival analysis/event history analysis data.
The beginning of time in survival analysis is rarely calendar time, but is the first day the individual was observed in the study. This affects your interpretation in that intervention/treatment effects are understood to affect person time (e.g. to affect the hazard function in a given abstracted notion of "days since start of observation", or "days since diagnosis" or "days since treatment"... depending on the nature of your study design), rather than affecting the hazard function in terms of calendar time (i.e. you are not trying to estimate change in hazard due to treatment on June, 3rd, 2014).
If you only followed people for 6 months: that's 180 days; unless everyone experienced readmission by 180 days, there should be some right censoring, and the survival curve should not plummet to 0 at 180 days.
Solved – High Censoring Rate in Survival Analysis; Much higher survival time among censored patients
I also found the word "censoring" to be confusing when I first started survival analysis.
"Censored" individuals aren't removed from analysis; they're just treated differently from those with events/deaths. If censoring is non-informative, as @swmo discussed, then a censored individual provides information that the event did not occur up to the censoring time. Just doesn't provide the exact time.
A standard survival curve includes the censored patients, noting the censoring times with a mark on the curve at the censoring time. The survival curve only drops at times of (noncensored) events, with a drop given by the ratio of events at that time to the total at risk at that time, including those with later censoring times. So the survival curves for the wonder drug in your example would in fact look quite good, with the few early events leading to small drops in the curve (as the fraction of individuals dying early was small) and then a high survival fraction thereafter.
Also, you're not usually comparing censored to uncensored patients within a single treatment group, as the question seems to suggest. Rather, you're comparing the timing of events in treatment group A to those in treatment group B. So in a test of a poor drug A versus wonder drug B, there would be many events/deaths in group A and few in group B, or at least events would tend to happen earlier in group A.
If most patients in group B are "cured" and they are not otherwise at high risk of death, then the "survival times" of the censored individuals in that group would mostly be determined by the duration of the study. A longer survival time for censored versus non-censored individuals may just mean that the study went on long enough to pick up most of those who were not "cured" by drug B.
Best Answer
The Kaplan-Meier estimator is not biased when a large proportion of individuals are censored. One of the problems we often observe is that the majority of power for the log-rank test is derived from early failure times which are difficult to observe in KM curves. It does mean that the median survival time is an unreliable point estimate. However, the hazard ratio from a Cox model serves as a good estimate of the relative risk and is unbiased regardless of the amount of censoring that occurs. Both the log rank and the Cox model are adequate tests of survival that are unbiased in interval, right, and left censored data.
The KM curves are biased when there is informative censoring however.