Solved – the definition of “death rate” in survival analysis

I am reading a text on survival analysis (Smith's 2002 Analysis of Failure and Survival Data). All concepts like hazard function, survival function, density of survival variable $Y$ are rigorously defined. However, in an exercise (p.14) it says "Suppose that the death rate of a person that smokes is at each age twice that of a non-smoker. If $h_s(y)$ denotes the hazard rate of a smoker at age $y$ and $h_n(y)$ that of a non-smoker at age $y$, write an equation relating $h_s(y)$ and $h_n(y)$".

This left me puzzled on what the author means by death rate (it has not been defined as far as I can see). Perhaps he means the derivative of the survival function or something alike.

This idea may be backed up by a google search in which I found the definition

$$\lambda = \frac{D}{T}$$

where $D$ number of deaths in time interval $T$. In probability terms I would thus conclude a useful definition may be

$$\lambda = \frac{F(Y+t) – F(Y) }{t}$$

where $F(Y)$ the cummulative distribution function. Furthermore if I let

$$\lim_{t \rightarrow 0} \frac{F(Y+t) – F(Y) }{t} = f(y)=-S'(Y)$$

where $f$ the density and $S$ the survival function.

Is this logic formalized or convention somewhere?

Best Answer

A rate has a specific definition of $\frac{\# \mbox{events}}{\# \mbox{person-years}}$. A risk on the other hand refers to a particular individual's risk of experiencing an outcome of interest, and it is risk which is intrinsically related to the hazard (instantaneous risk). The language the question uses is consistent with this understanding. If I had to change it, I would say, "The death rate for smokers is twice that of *non-smokers". They also failed to mention whether these were age adjusted rates or not.

To understand this a little more deeply, relative rates and relative risks are estimated with fundamentally different models.

If you wanted to formalize a rate, you can think of this as estimating:

$$E \left( \frac{\# \mbox{events}}{\# \mbox{person-years}} \right) =\frac{\sum_i Pr(Y_i < t_i)} {\sum_i t_i} $$

($Y_i$ is the death time and $t_i$ is the observation time for the $i$-th individual, note the times are considered fixed and not random!)

You'll recognize the numerator is a bunch of CDFs, or 1-survival functions, and the relationship with survival functions and hazards is well known.

So if you took a ratio of rates:

$$ 2 = E \left( \frac{ \# \mbox{smoker deaths} \times \# \mbox{non-smoker person-years}}{\# {non-smoker deaths} \times \# \mbox{smoker person years}} \right) = \frac{\sum_i t_i}{\sum_j t_j} \frac{\sum_j Pr(Y_j < t_j)}{\sum_i Pr(Y_i < t_i)}$$

$$ = \frac{\sum_i t_i}{\sum_j t_j} \frac{ n_j-\sum_jS(t_j)}{n_i-\sum_iS(t_i)}$$

Since it's self study, you should probably do the algebra and solve the remainder of the equation!

Best Answer

Related Solutions

Solved – Proof of relationship between hazard rate, probability density, survival function

Solved – Hazard and density function in survival analysis with discrete time

Related Question