Survival Analysis – Understanding the Intuition Behind the Hazard Rate

hazardintuitionsurvival

I am confused about the equation that serves as the definition of the hazard rate. I get the idea of what the hazard rate is, but I just don't see how the equation expresses that intuition.

If $x$ is a random variable which represents the point of time of death of someone on a time interval $[0,T]$. Then the hazard rate is:

$$h(x)=\frac{f(x)}{1-F(x)}$$

Where $F(x)$ represents the probability of death until time point $x\in[0,T]$,
$1-F(x)$ represents the probability of having survived up until time point $x\in[0,T]$,
and $f(x)$ is the probability of death at point $x$.

How does dividing $f(x)$ by the survival rate explain the intuition of the probability of instantaneous death in the next $\Delta t$? Shouldn't it just be $f(x)$, making the calculation of the hazard rate trivial?

Best Answer

Let $X$ denote the time of death (or time of failure if you prefer a less morbid description). Suppose that $X$ is a continuous random variable whose density function $f(t)$ is nonzero only on $(0,\infty)$. Now, notice that it must be the case that $f(t)$ decays away to $0$ as $t \to \infty$ because if $f(t)$ does not decay away as stated, then $\displaystyle \int_{-\infty}^\infty f(t)\,\mathrm dt = 1$ cannot hold. Thus, your notion that $f(T)$ is the probability of death at time $T$ (actually, it is $f(T)\Delta t$ that is (approximately) the probability of death in the short interval $(T, T+\Delta t]$ of length $\Delta t$) leads to implausible and unbelievable conclusions such as

You are more likely to die within the next month when you are thirty years old than when you are ninety-eight years old.

whenever $f(t)$ is such that $f(30) > f(98)$.

The reason why $f(T)$ (or $f(T)\Delta t$) is the "wrong" probability to look at is that the value of $f(T)$ is of interest only to those who are alive at age $T$ (and still mentally alert enough to read stats.SE on a regular basis!) What ought to be looked at is the probability of a $T$-year old dying within the next month, that is,

\begin{align}P\{(X \in (T, T+\Delta t] \mid X \geq T\} &= \frac{P\{\left(X \in (T, T+\Delta t]\right) \cap \left(X\geq T\right)\}}{P\{X\geq T\}} & \\ \scriptstyle{ \text{ definition of conditional probability}}\\ &= \frac{P\{X \in (T, T+\Delta t]\}}{P\{X\geq T\}}\\ &= \frac{f(T)\Delta t}{1-F(T)} & \\ \scriptstyle{ \text{because }X\text{ is a continuous rv}} \end{align}

Choosing $\Delta t$ to be a fortnight, a week, a day, an hour, a minute, etc. we come to the conclusion that the (instantaneous) hazard rate for a $T$-year old is

$$h(T) = \frac{f(T)}{1-F(T)}$$

in the sense that the approximate probability of death in the next femtosecond $(\Delta t)$ of a $T$-year old is $\displaystyle \frac{f(T)\Delta t}{1-F(T)}.$

Note that in contrast to the density $f(t)$ integrating to $1$, the integral $\displaystyle \int_0^\infty h(t)\, \mathrm dt$ must diverge. This is because the CDF $F(t)$ is related to the hazard rate through

$$F(t) = 1 - \exp\left(-\int_0^t h(\tau)\, \mathrm d\tau\right)$$ and since $\lim_{t\to \infty}F(t) = 1$, it must be that $$\lim_{t\to \infty} \int_0^t h(\tau)\, \mathrm d\tau = \infty,$$ or stated more formally, the integral of the hazard rate must diverge: there is no potential divergence as a previous edit claimed.

Typical hazard rates are increasing functions of time, but constant hazard rates (exponential lifetimes) are possible. Both of these kinds of hazard rates obviously have divergent integrals. A less common scenario (for those who believe that things improve with age, like fine wine does) is a hazard rate that decreases with time but slowly enough that the integral diverges.

Related Question