I think that the apparent discrepancy between the text and your professor has to do with the truncated (and thus missing) "observations" versus the implications for how you handle the data that you do have.
Yes, a truncated "observation" is one that is unavailable to the study because its value is out of range. But you don't have those truncated "observations" to work with at all. What you have is a sample of non-truncated observations. Wikipedia puts it nicely:
A truncated sample can be thought of as being equivalent to an underlying sample with all values outside the bounds entirely omitted, with not even a count of those omitted being kept.
Your professor's emphasis is on how you analyze the data that you do have. From that perspective in the example case, you have to treat the age values of the observations you have as left-truncated, as the sample provides no information about ages below the threshold. That applies to the entire sample at hand.
Section 5.3 of my edition of your text explains a standard situation that leads to right truncation: when you enroll participants into a study only after they have developed some disease. In that case, their times between some initiating cause (like an initial infection) to the event of developing overt disease provides no information about individuals who might have a longer time between initiating cause and overt disease. The example there is for individuals who developed AIDS following blood transfusions.
A mixed situation: time-varying covariates
In the provided example of a cutoff of > 60 years of age to be entered into a study, the entire sample needs to be treated as left-truncated if time = 0
is treated as date of birth. In other circumstances you need to evaluate each observation in your sample with respect to whether it needs to be treated as truncated or censored.
This can happen with time-varying covariates handled via a counting process, as in the R coxph()
function. Say that an individual starts at time = 0
with a categorical covariate having level A
and stays event-free until time = 5
, at which time that covariate changes to level B
, remains event-free until time = 7
when the covariate changes to level C
, and has the event at time = 9
with covariate level still C
. The data for that individual might be coded as follows:
start stop covariate event
0 5 A 0
5 7 B 0
7 9 C 1
Now you have to think about truncation and censoring for each time period for the individual. For the first time period you have simple right censoring at time = 5
, as you are starting from the reference time = 0
and there is no event. It contains information with respect to a covariate value of A
for the entire period starting at the time = 0
reference through time = 5
, so there is no left truncation. The second time period is also right censored, here at time = 7
, as there is no event then; it is treated as left truncated, as the data provide no information about a covariate level of B
prior to time = 5
. The third period similarly provides no information about a covariate level of C
prior to time = 7
so it is left truncated, with the (uncensored) event at time = 9
.
So I suppose it's best to think about truncation on a data-value by data-value basis: about what time periods do the data provide information? In some circumstances all the data values need to be considered truncated, as in the left-truncation example of the retirement home or the right-truncation when only those with the event are included in the sample. But in other situations you need to proceed more cautiously.
We could rephrase your question asking whether methods based on full data (i.e. noncensored data) are necessarily more efficient than methods based on observed data (i.e. censored data). This question can be answered in general by semiparametric efficiency theory.
Let $Z$ denote the full data (such as covariates and failure time). Suppose we have a data set of i.i.d. draws $Z_1, \dots Z_n$. A full data estimator $\hat\beta$ for an estimand $\beta^*$ is asymptotically linear with influence function $\varphi^F$ if $$\sqrt{n} ( \hat\beta - \beta^*) = \frac{1}{\sqrt{n}} \sum_{i=1}^n \varphi^F(Z_i) + o_P(n^{-1/2}).$$ Such an estimator has asymptotic variance $\mathrm{var}\left\{ \varphi^F(Z) \right\}$. Likewise, let $\mathcal{O}$ be the observed data, which denotes the full data $Z$ subject to coarsening or missingness. We can similarly define the influence function $\varphi$ for an observed data estimator.
This suggests that we can compare the efficiency of observed data estimators and full data estimators through comparisons of their influence functions. Rather than studying the influence function of a given estimator, we can study the class of influence functions of all regular estimators of the estimand $\beta^*$.
Lemma 7.4 in Tsiatis (2006) establishes the relationship between the class of influence functions of observed data estimators and the corresponding class for full data estimators. He shows that the class of observed data influence functions equals
\begin{equation*}
\frac{I(\mathcal{C}=\infty)}{\varpi(\infty, Z)} \varphi^F(Z) + L_2(\mathcal{O}),
\end{equation*}
where $\mathcal{C}=\infty$ denotes that the full data is observed ( i.e. $T \leq C$ in survival analysis), $\varpi(\infty, Z) = \mathbb{P}[\mathcal{C}=\infty \mid Z]$ is the conditional probability of observing the full data $L_2$ is an arbitrary function satisfying $\mathbb{E}[L_2(\mathcal{O})\mid Z] = 0$, and $\varphi^F$ is an arbitrary full data influence function.
Based on this identity, we can derive the asymptotic variance of an observed data asymptotically linear estimator with influence function $\varphi$ as
\begin{align*}
& \mathrm{var} \left\{ \varphi(\mathcal{O}) \right\} \\
=\, & \mathrm{var} \left[ \mathbb{E} \left\{ \varphi(\mathcal{O}) \mid Z \right\} \right] + \mathbb{E} \left[ \mathrm{var} \left\{ \varphi(\mathcal{O}) \mid Z \right\} \right] \\
=\, & \mathrm{var} \left[ \mathbb{E} \left\{ \frac{I(\mathcal{C}=\infty)}{\varpi(\infty, Z)} \varphi^F(Z) + L_2(\mathcal{O}) \mid Z \right\} \right] + \mathbb{E} \left[ \mathrm{var} \left\{ \varphi(\mathcal{O}) \mid Z \right\} \right] \\
=\, & \mathrm{var} \left[ \mathbb{E} \left\{ \frac{I(\mathcal{C}=\infty)}{\varpi(\infty, Z)} \varphi^F(Z) \mid Z \right\} \right] + \mathbb{E} \left[ \mathrm{var} \left\{ \varphi(\mathcal{O}) \mid Z \right\} \right] \\
=\, & \mathrm{var} \left[ \varphi^F(Z) \right]
+ \mathbb{E} \left[ \mathrm{var} \left\{ \varphi(\mathcal{O}) \mid Z \right\} \right] \\
\succcurlyeq\, & \mathrm{var} \left[ \varphi^F(Z) \right] &
\end{align*}
This shows that any observed data estimator has higher variance than its corresonding full data estimator. The inequality is tight when the second summand has conditional variance zero: this means that the observed data equals the full data. In a survival analysis setting, this shows that whenever censoring is present, the observed data estimators are less efficient than the full data estimators.
Best Answer
I also found the word "censoring" to be confusing when I first started survival analysis.
"Censored" individuals aren't removed from analysis; they're just treated differently from those with events/deaths. If censoring is non-informative, as @swmo discussed, then a censored individual provides information that the event did not occur up to the censoring time. Just doesn't provide the exact time.
A standard survival curve includes the censored patients, noting the censoring times with a mark on the curve at the censoring time. The survival curve only drops at times of (noncensored) events, with a drop given by the ratio of events at that time to the total at risk at that time, including those with later censoring times. So the survival curves for the wonder drug in your example would in fact look quite good, with the few early events leading to small drops in the curve (as the fraction of individuals dying early was small) and then a high survival fraction thereafter.
Also, you're not usually comparing censored to uncensored patients within a single treatment group, as the question seems to suggest. Rather, you're comparing the timing of events in treatment group A to those in treatment group B. So in a test of a poor drug A versus wonder drug B, there would be many events/deaths in group A and few in group B, or at least events would tend to happen earlier in group A.
If most patients in group B are "cured" and they are not otherwise at high risk of death, then the "survival times" of the censored individuals in that group would mostly be determined by the duration of the study. A longer survival time for censored versus non-censored individuals may just mean that the study went on long enough to pick up most of those who were not "cured" by drug B.