Solved – Trying to understand formula for the Survival Function (survival analysis)

probabilitysurvival

I'm trying to learn the Cox Proportional Hazards Model on my own, and found this link that describes it in clear terms. But when I get to Formula (5) ($S(t) = \exp(−H(t))$) I can't figure out where that's coming from. On the author's previous page, he shows that the survival function equals $S(t) = \exp(−H(t))$ if we assume an exponential distribution, but in Cox we don't assume that.

Is $S(t) = \exp(−H(t))$ something that works for any hazard distribution? I can't think of a way to prove/disprove this, and the intuition isn't making sense for me.

Best Answer

All of these terms are standard in actuarial science and all of them apply to all distributions (but when I have seen these terms in studying for exams, we're almost always talking about distributions that are defined only for nonnegative reals). $H(t)$ is the cumulative hazard function, and for any distribution is defined as $$H(t) = \int_0^t h(x) \,dx.$$ Notice the name makes perfect sense with this definition, since we are "adding" up the hazard function up to a certain point to get the cumulative hazard function. Now, since $$f(t) = F'(t) = -S'(t)$$ then we have $$h(t) = \frac{f(t)}{S(t)} = \frac{-S'(t)}{S(t)} = -\frac{d}{dt} (\ln S(t)).$$ Finally, that means we have $$H(t) = \int_0^t -\frac{d}{dx} (\ln S(x)) \,dx = -\ln S(t)$$ since $S(0)$ is usually required to be 1 and thus $\ln S(0) = 0$.

Related Solutions

Solved – Can Hazard Ratio be translated into ratio of medians of survival time

Your intuition is correct. The following relationship between survival functions holds: $$ S_1(t)=S_0(t)^r $$ where $r$ is the hazard ratio (see, e.g. the Wikipedia article Hazard ratio). From this we may show that your statement implies an exponential survival function.

Let us denote the medians by $M_r$, $M_1$ for two variables with hazard ratio $r$. Your statement implies $$ M_r = M_0/r $$ From the definition of the median, we get $$ S_r(M_0/r)=0.5 $$ Then, we substitute the relationship between survival functions $$ S_0(M_0/r)^r=0.5 \Rightarrow S_0(M_0/r) = 0.5^{1/r} $$ This holds for any $r$, hence $$ S_0(t) = 0.5^{t/M_0} = e^{t\frac{\log 0.5}{M_0}} $$ Hence, if the statement in your question holds for arbitrary HR, the survival distribution must be exponential.

Cumulative Hazard Function – Intuition in Survival Analysis

Combining proportions dying as you do is not giving you cumulative hazard. Hazard rate in continuous time is a conditional probability that during a very short interval an event will happen:

$$h(t) = \lim_{\Delta t \rightarrow 0} \frac {P(t<T \le t + \Delta t | T >t)} {\Delta t}$$

Cumulative hazard is integrating (instantaneous) hazard rate over ages/time. It's like summing up probabilities, but since $\Delta t$ is very small, these probabilities are also small numbers (e.g. hazard rate of dying may be around 0.004 at ages around 30). Hazard rate is conditional on not having experienced the event before $t$, so for a population it may sum over 1.

You may look up some human mortality life table, although this is a discrete time formulation, and try to accumulate $m_x$.

If you use R, here's a little example of approximating these functions from number of deaths at each 1-year age interval:

dx <-  c(3184L, 268L, 145L, 81L, 64L, 81L, 101L, 50L, 72L, 76L, 50L, 
         62L, 65L, 95L, 86L, 120L, 86L, 110L, 144L, 147L, 206L, 244L, 
         175L, 227L, 182L, 227L, 205L, 196L, 202L, 154L, 218L, 279L, 193L, 
         223L, 227L, 300L, 226L, 256L, 259L, 282L, 303L, 373L, 412L, 297L, 
         436L, 402L, 356L, 485L, 495L, 597L, 645L, 535L, 646L, 851L, 689L, 
         823L, 927L, 878L, 1036L, 1070L, 971L, 1225L, 1298L, 1539L, 1544L, 
         1673L, 1700L, 1909L, 2253L, 2388L, 2578L, 2353L, 2824L, 2909L, 
         2994L, 2970L, 2929L, 3401L, 3267L, 3411L, 3532L, 3090L, 3163L, 
         3060L, 2870L, 2650L, 2405L, 2143L, 1872L, 1601L, 1340L, 1095L, 
         872L, 677L, 512L, 376L, 268L, 186L, 125L, 81L, 51L, 31L, 18L, 
         11L, 6L, 3L, 2L)

x <- 0:(length(dx)-1) # age vector

plot((dx/sum(dx))/(1-cumsum(dx/sum(dx))), t="l", xlab="age", ylab="h(t)", 
     main="h(t)", log="y")
plot(cumsum((dx/sum(dx))/(1-cumsum(dx/sum(dx)))), t="l", xlab="age", ylab="H(t)", 
     main="H(t)")

Hope this helps.

Best Answer

Related Solutions

Solved – Can Hazard Ratio be translated into ratio of medians of survival time

Cumulative Hazard Function – Intuition in Survival Analysis

Related Question