Solved – Bayesian Survival Analysis: please, write me a prior for Kaplan Meier!

bayesiankaplan-meiersurvival

Consider right-censored observations, with events at times $t_1, t_2, \dots$.
The number of susceptible individuals at time $i$ is $n_i$, and the number of events at time $i$ is $d_i$.

The Kaplan-Meier or product estimator arises naturally as a MLE when the survival function is a step function $S(t) = \prod_{i : t_i < t} \alpha_i$. The likelihood is then
$$ L(\alpha) = \prod_i (1-\alpha_i)^{d_i} \alpha_i^{n_i-d_i} $$
and the MLE is $\widehat\alpha_i = 1 – {d_i\over n_i}$.

OK, now assume that I want to go Bayesian. I need some kind of “natural'' prior with which I will multiply $L(\alpha)$, right?

Googling the obvious keywords I found that the Dirichlet process is a good prior. But as far as I understand, it is also a prior on the discontinuity points $t_i$?

This is surely very interesting and I am eager to learn about it, however I would settle for something simpler. I begin to suspect that’s not so easy as I first thought, and it’s time to ask for your advice…

Many thanks in advance!

PS: A few precision on what I am hoping I am interested in (as simple as possible) explanations about the way to handle the Dirichlet process prior, however I think it should be possible to use simply a prior on the $\alpha_i$ — that is a prior on the step functions with discontinuities in $t_i$.

I think that the "global shape" of the step functions sampled in the prior should not depends on the $t_i$'s — there should be an underlying family of continuous functions which are approximated by these step functions.

I don't know if the $\alpha_i$ should be independent (I doubt it). If they are, I think this implies that the prior $\alpha_i$ depends on$\Delta t_i = t_i – t_{i-1}$, and if we denote its distribution by $A(\Delta t)$ then the product of a $A(\Delta_1)$ variable by an independent $A(\Delta_2)$ variable is a $A(\Delta_1+\Delta_2)$ variable. It seems here that log-$\Gamma$ variables can be useful.

But here basically I am stuck. I did not type this at first because I did not want to direct all answers in this direction. I would particularly appreciate answers with bibliographic references to help me justifying my final choice.

Best Answer

Note that because your likelihood function is a product of $ \alpha_i $ functions - the data are telling you that there is no evidence for correlation between them. Note that the $ d_i $ variables are already scaling to account for time. Longer time period means more chance for events, generally meaning larger $ d_i $.

The most basic way to "go Bayesian" here is to use independent uniform priors $ p (\alpha_i)=1 $. Note that $0 <\alpha_i <1 $ so this is a proper prior - hence posterior is also proper. The posterior is independent beta distributions with parameters $ p (\alpha_i)\sim beta (n_i-d_i+1, d_i+1) $. This can be easily simulated to generate the posterior distribution of the survival curve, using rbeta () function in R for example.

I think this gets at your main question about a "simpler" method. Below is just the beginings of an idea to create a better model, that retains the flexible KM form for the survival function.

I think the main problem with the KM curve is in the Survival function though, and not in the prior. For example, why should the $ t_i $ values correspond to time points that were observed? Wouldn't it make more sense to place them at points corresponding to meaningful event times based on the actual process? If the observed time points are too far apart, the KM curve will be "too smooth". If they are too close, the KM curve will be "too rough", and potentially exhibit abrupt changes. One way to deal with the "too rough" problem is to place a correlated prior on $\alpha $ such that $\alpha_i\approx \alpha_{i+1} $. The effect of this prior will be to shrink nearby parameters closer together. You could use this in the "log-odds" space $\eta_i=\log\left (\frac {\alpha_i}{1-\alpha_i}\right) $ and use a kth order random walk prior on $\eta $. For a first order random walk this introduces penalties of the form $-\tau(\eta_i -\eta_{i-1})^2 $ into the log-likelihood. The BayesX software has some very good documentation of this kind of smoothing. Basically choosing the order k is like doing a kth order local polynomial. If you like splines, choose k=3. Of course, by using a "fine" time grid you will have time points with no observations. Howdver, this complicates your likelihood function, as the $ n_i, d_i$ are missing for some $i $. For example if $( t_0,t_1) $ was split into 3 "finer" intervals $(t_{00}, t_{01}, t_{02}, t_{10}) $ then you don't know $ n_{02}, n_{10}, d_{01}, d_{02}, d_{10} $ but only $ n_1=n_{01}$ and $ d_1=d_{01}+d_{02}+d_{10} $. So you would probably need to add these "missing data" and use an EM algorithm or perhaps VB (provided you're not going down the mcmc path).

Hope this gives you a start.

Related Question