Survival Analysis – Understanding the Intuition Behind the Hazard Rate

hazardintuitionsurvival

I am confused about the equation that serves as the definition of the hazard rate. I get the idea of what the hazard rate is, but I just don't see how the equation expresses that intuition.

If $x$ is a random variable which represents the point of time of death of someone on a time interval $[0,T]$. Then the hazard rate is:

$$h(x)=\frac{f(x)}{1-F(x)}$$

Where $F(x)$ represents the probability of death until time point $x\in[0,T]$,
$1-F(x)$ represents the probability of having survived up until time point $x\in[0,T]$,
and $f(x)$ is the probability of death at point $x$.

How does dividing $f(x)$ by the survival rate explain the intuition of the probability of instantaneous death in the next $\Delta t$? Shouldn't it just be $f(x)$, making the calculation of the hazard rate trivial?

Best Answer

Let $X$ denote the time of death (or time of failure if you prefer a less morbid description). Suppose that $X$ is a continuous random variable whose density function $f(t)$ is nonzero only on $(0,\infty)$. Now, notice that it must be the case that $f(t)$ decays away to $0$ as $t \to \infty$ because if $f(t)$ does not decay away as stated, then $\displaystyle \int_{-\infty}^\infty f(t)\,\mathrm dt = 1$ cannot hold. Thus, your notion that $f(T)$ is the probability of death at time $T$ (actually, it is $f(T)\Delta t$ that is (approximately) the probability of death in the short interval $(T, T+\Delta t]$ of length $\Delta t$) leads to implausible and unbelievable conclusions such as

You are more likely to die within the next month when you are thirty years old than when you are ninety-eight years old.

whenever $f(t)$ is such that $f(30) > f(98)$.

The reason why $f(T)$ (or $f(T)\Delta t$) is the "wrong" probability to look at is that the value of $f(T)$ is of interest only to those who are alive at age $T$ (and still mentally alert enough to read stats.SE on a regular basis!) What ought to be looked at is the probability of a $T$-year old dying within the next month, that is,

\begin{align}P\{(X \in (T, T+\Delta t] \mid X \geq T\} &= \frac{P\{\left(X \in (T, T+\Delta t]\right) \cap \left(X\geq T\right)\}}{P\{X\geq T\}} & \\ \scriptstyle{ \text{ definition of conditional probability}}\\ &= \frac{P\{X \in (T, T+\Delta t]\}}{P\{X\geq T\}}\\ &= \frac{f(T)\Delta t}{1-F(T)} & \\ \scriptstyle{ \text{because }X\text{ is a continuous rv}} \end{align}

Choosing $\Delta t$ to be a fortnight, a week, a day, an hour, a minute, etc. we come to the conclusion that the (instantaneous) hazard rate for a $T$-year old is

$$h(T) = \frac{f(T)}{1-F(T)}$$

in the sense that the approximate probability of death in the next femtosecond $(\Delta t)$ of a $T$-year old is $\displaystyle \frac{f(T)\Delta t}{1-F(T)}.$

Note that in contrast to the density $f(t)$ integrating to $1$, the integral $\displaystyle \int_0^\infty h(t)\, \mathrm dt$ must diverge. This is because the CDF $F(t)$ is related to the hazard rate through

$$F(t) = 1 - \exp\left(-\int_0^t h(\tau)\, \mathrm d\tau\right)$$ and since $\lim_{t\to \infty}F(t) = 1$, it must be that $$\lim_{t\to \infty} \int_0^t h(\tau)\, \mathrm d\tau = \infty,$$ or stated more formally, the integral of the hazard rate must diverge: there is no potential divergence as a previous edit claimed.

Typical hazard rates are increasing functions of time, but constant hazard rates (exponential lifetimes) are possible. Both of these kinds of hazard rates obviously have divergent integrals. A less common scenario (for those who believe that things improve with age, like fine wine does) is a hazard rate that decreases with time but slowly enough that the integral diverges.

Related Solutions

Cumulative Hazard Function – Intuition in Survival Analysis

Combining proportions dying as you do is not giving you cumulative hazard. Hazard rate in continuous time is a conditional probability that during a very short interval an event will happen:

$$h(t) = \lim_{\Delta t \rightarrow 0} \frac {P(t<T \le t + \Delta t | T >t)} {\Delta t}$$

Cumulative hazard is integrating (instantaneous) hazard rate over ages/time. It's like summing up probabilities, but since $\Delta t$ is very small, these probabilities are also small numbers (e.g. hazard rate of dying may be around 0.004 at ages around 30). Hazard rate is conditional on not having experienced the event before $t$, so for a population it may sum over 1.

You may look up some human mortality life table, although this is a discrete time formulation, and try to accumulate $m_x$.

If you use R, here's a little example of approximating these functions from number of deaths at each 1-year age interval:

dx <-  c(3184L, 268L, 145L, 81L, 64L, 81L, 101L, 50L, 72L, 76L, 50L, 
         62L, 65L, 95L, 86L, 120L, 86L, 110L, 144L, 147L, 206L, 244L, 
         175L, 227L, 182L, 227L, 205L, 196L, 202L, 154L, 218L, 279L, 193L, 
         223L, 227L, 300L, 226L, 256L, 259L, 282L, 303L, 373L, 412L, 297L, 
         436L, 402L, 356L, 485L, 495L, 597L, 645L, 535L, 646L, 851L, 689L, 
         823L, 927L, 878L, 1036L, 1070L, 971L, 1225L, 1298L, 1539L, 1544L, 
         1673L, 1700L, 1909L, 2253L, 2388L, 2578L, 2353L, 2824L, 2909L, 
         2994L, 2970L, 2929L, 3401L, 3267L, 3411L, 3532L, 3090L, 3163L, 
         3060L, 2870L, 2650L, 2405L, 2143L, 1872L, 1601L, 1340L, 1095L, 
         872L, 677L, 512L, 376L, 268L, 186L, 125L, 81L, 51L, 31L, 18L, 
         11L, 6L, 3L, 2L)

x <- 0:(length(dx)-1) # age vector

plot((dx/sum(dx))/(1-cumsum(dx/sum(dx))), t="l", xlab="age", ylab="h(t)", 
     main="h(t)", log="y")
plot(cumsum((dx/sum(dx))/(1-cumsum(dx/sum(dx)))), t="l", xlab="age", ylab="H(t)", 
     main="H(t)")

Hope this helps.

Solved – a hazard rate

It is the expected number of times you are expected to experience the event per time interval given that you have survived thus far. The key difference with your definition is that it is a rate not a probability.

Best Answer

Related Solutions

Cumulative Hazard Function – Intuition in Survival Analysis

Solved – a hazard rate

Related Question