I think that the apparent discrepancy between the text and your professor has to do with the truncated (and thus missing) "observations" versus the implications for how you handle the data that you do have.
Yes, a truncated "observation" is one that is unavailable to the study because its value is out of range. But you don't have those truncated "observations" to work with at all. What you have is a sample of non-truncated observations. Wikipedia puts it nicely:
A truncated sample can be thought of as being equivalent to an underlying sample with all values outside the bounds entirely omitted, with not even a count of those omitted being kept.
Your professor's emphasis is on how you analyze the data that you do have. From that perspective in the example case, you have to treat the age values of the observations you have as left-truncated, as the sample provides no information about ages below the threshold. That applies to the entire sample at hand.
Section 5.3 of my edition of your text explains a standard situation that leads to right truncation: when you enroll participants into a study only after they have developed some disease. In that case, their times between some initiating cause (like an initial infection) to the event of developing overt disease provides no information about individuals who might have a longer time between initiating cause and overt disease. The example there is for individuals who developed AIDS following blood transfusions.
A mixed situation: time-varying covariates
In the provided example of a cutoff of > 60 years of age to be entered into a study, the entire sample needs to be treated as left-truncated if time = 0
is treated as date of birth. In other circumstances you need to evaluate each observation in your sample with respect to whether it needs to be treated as truncated or censored.
This can happen with time-varying covariates handled via a counting process, as in the R coxph()
function. Say that an individual starts at time = 0
with a categorical covariate having level A
and stays event-free until time = 5
, at which time that covariate changes to level B
, remains event-free until time = 7
when the covariate changes to level C
, and has the event at time = 9
with covariate level still C
. The data for that individual might be coded as follows:
start stop covariate event
0 5 A 0
5 7 B 0
7 9 C 1
Now you have to think about truncation and censoring for each time period for the individual. For the first time period you have simple right censoring at time = 5
, as you are starting from the reference time = 0
and there is no event. It contains information with respect to a covariate value of A
for the entire period starting at the time = 0
reference through time = 5
, so there is no left truncation. The second time period is also right censored, here at time = 7
, as there is no event then; it is treated as left truncated, as the data provide no information about a covariate level of B
prior to time = 5
. The third period similarly provides no information about a covariate level of C
prior to time = 7
so it is left truncated, with the (uncensored) event at time = 9
.
So I suppose it's best to think about truncation on a data-value by data-value basis: about what time periods do the data provide information? In some circumstances all the data values need to be considered truncated, as in the left-truncation example of the retirement home or the right-truncation when only those with the event are included in the sample. But in other situations you need to proceed more cautiously.
Best Answer
Definitions vary, and the two terms are sometimes used interchangeably. I'll try to explain the most common uses using the following data set: $$ 1\qquad 1.25\qquad 2\qquad 4 \qquad 5$$
Censoring: some observations will be censored, meaning that we only know that they are below (or above) some bound. This can for instance occur if we measure the concentration of a chemical in a water sample. If the concentration is too low, the laboratory equipment cannot detect the presence of the chemical. It may still be present though, so we only know that the concentration is below the laboratory's detection limit.
If the detection limit is 1.5, so that observations that fall below this limit is censored, our example data set would become: $$ <1.5\qquad <1.5\qquad 2\qquad 4 \qquad 5,$$ that is, we don't know the actual values of the first two observations, but only that they are smaller than 1.5.
Truncation: the process generating the data is such that it only is possible to observe outcomes above (or below) the truncation limit. This can for instance occur if measurements are taken using a detector which only is activated if the signals it detects are above a certain limit. There may be lots of weak incoming signals, but we can never tell using this detector.
If the truncation limit is 1.5, our example data set would become $$2\qquad 4 \qquad 5$$ and we would not know that there in fact were two signals which were not recorded.