Note that because your likelihood function is a product of $ \alpha_i $ functions - the data are telling you that there is no evidence for correlation between them. Note that the $ d_i $ variables are already scaling to account for time. Longer time period means more chance for events, generally meaning larger $ d_i $.
The most basic way to "go Bayesian" here is to use independent uniform priors $ p (\alpha_i)=1 $. Note that $0 <\alpha_i <1 $ so this is a proper prior - hence posterior is also proper. The posterior is independent beta distributions with parameters $ p (\alpha_i)\sim beta (n_i-d_i+1, d_i+1) $. This can be easily simulated to generate the posterior distribution of the survival curve, using rbeta ()
function in R for example.
I think this gets at your main question about a "simpler" method. Below is just the beginings of an idea to create a better model, that retains the flexible KM form for the survival function.
I think the main problem with the KM curve is in the Survival function though, and not in the prior. For example, why should the $ t_i $ values correspond to time points that were observed? Wouldn't it make more sense to place them at points corresponding to meaningful event times based on the actual process?
If the observed time points are too far apart, the KM curve will be "too smooth". If they are too close, the KM curve will be "too rough", and potentially exhibit abrupt changes.
One way to deal with the "too rough" problem is to place a correlated prior on $\alpha $ such that $\alpha_i\approx \alpha_{i+1} $. The effect of this prior will be to shrink nearby parameters closer together. You could use this in the "log-odds" space $\eta_i=\log\left (\frac {\alpha_i}{1-\alpha_i}\right) $ and use a kth order random walk prior on $\eta $. For a first order random walk this introduces penalties of the form $-\tau(\eta_i -\eta_{i-1})^2 $ into the log-likelihood. The BayesX software has some very good documentation of this kind of smoothing. Basically choosing the order k is like doing a kth order local polynomial. If you like splines, choose k=3. Of course, by using a "fine" time grid you will have time points with no observations. Howdver, this complicates your likelihood function, as the $ n_i, d_i$ are missing for some $i $. For example if $( t_0,t_1) $ was split into 3 "finer" intervals $(t_{00}, t_{01}, t_{02}, t_{10}) $ then you don't know $ n_{02}, n_{10}, d_{01}, d_{02}, d_{10} $ but only $ n_1=n_{01}$ and $ d_1=d_{01}+d_{02}+d_{10} $. So you would probably need to add these "missing data" and use an EM algorithm or perhaps VB (provided you're not going down the mcmc path).
Hope this gives you a start.
A Poisson process is a model for a stream of "random" arrivals and has the properties that
there can be at most one arrival at any instant $t$
the number of arrivals in any interval $(t_1,t_2]$ is a Poisson random variable which is here denoted as $\mathbb N(t_1,t_2]$
For $t_1 < t_2 \leq t_3 < t_4 \leq t_5 < t_6 < \cdots \leq t_{2n-1} < t_{2n}$, the Poisson random variables
$\mathbb N(t_1,t_2], \mathbb N(t_3,t_4], \mathbb N(t_5,t_6], \cdots , \mathbb N(t_{2n-1},t_{2n}]$, which count the numbers of arrivals in the
$n$ disjoint or
non-overlapping time intervals
$(t_1,t_2],(t_3,t_4], (t_5, t_6], \cdots , (t_{2n-1},t_{2n}]$, are
independent random variables
The probability of exactly one arrival in a small time
interval $(t, t+\Delta t]$ is proportional to the length
$\Delta t$ of the time interval; the probability of two or more arrivals
in this small interval is $o(\Delta t)$ and can be neglected in the
limit as $\Delta t \to 0$.
The constant of proportionality in this last item is assumed to be a
constant $\lambda > 0$ for homogeneous Poisson processes but is assumed to be
varying with time for nonhomogeneous processes. That is, the probability
of one arrival in the vanishingly small interval
$(t, t+\Delta t]$ is $\lambda(t)\Delta t$ while the probability of no
arrivals during this interval is $1 - \lambda(t)\Delta t$. Here, of course,
we assume that $\lambda(t) > 0$ for all $t$. $\lambda(t)$ is called the
intensity of the process at time $t$.
Let $P_0(t)$ denote the probability that there are no arrivals in
the interval $(0,t]$. If no arrivals occurred in $(0,t+\Delta t]$,
then it must be that there are no arrivals in $(0,t]$ and that
there are no arrivals in $(t,t+\Delta t)$. The numbers of arrivals in these
two disjoint time intervals are independent random variables and so we see
that
$$\begin{align}
P_0(t+\Delta t) &= P_0(t)(1-\lambda(t)\Delta t)\\
P_0(t+\Delta t) - P_0(t) &= - \lambda(t)P_0(t)\Delta t\\
\frac{P_0(t+\Delta t) - P_0(t)}{\Delta t} &= -P_0(t)\lambda(t)\\
\frac{\mathrm dP_0(t)}{\mathrm dt} &= -P_0(t)\lambda(t)\\
P_0(t) &= \exp\left(-\int_0^t \lambda(\tau)\,\mathrm d\tau\right)\tag{1}\\
&= \exp\left(-t\cdot\bar{\lambda}(0,t]\right)\tag{2}
\end{align}$$
where $\bar{\lambda}(t_1,t_2]$ denotes the average value
$\displaystyle\frac{1}{t_2-t_1}\int_{t_1}^{t_2}\lambda(t)\,\mathrm dt$
over the time interval $(t_1,t_2]$
Skipping additional details, I will assert that the parameter of
the Poisson random variable
$\mathbb N(t_1,t_2]$ is $\displaystyle \int_{t_1}^{t_2}\lambda(t)\,\mathrm dt$.
Thus, the _average number of arrivals in $(t_1,t_2]$ is
$$E\left[\mathbb N(t_1,t_2]\right] = \int_{t_1}^{t_2}\lambda(t)\,\mathrm dt
= (t_2-t_1)\bar{\lambda}(t_1,t_2].\tag{3}$$
Note that the average number of arrivals in $(t_1,t_2]$ per unit time is
$\bar{\lambda}(t_1,t_2]$ and is called the average intensity over this time
interval while $\lambda(t)$ is called the (instantaneous) intensity at time $t$.
Poisson processes deal with a stream of arrivals whereas hazard rates
and survival analysis deal with only one arrival --
the arrival of the Angel of Death! Consider a system
that is put into operation at time $0$ and fails at some random time
$X > 0$. The hazard rate
function $h(t)$ tells us the conditional probability of the system
failing in the interval $(t,t+\Delta t]$ conditioned on the system
being in working condition at time $t$. Thus,
$$\begin{align}
h(t)\Delta t &= P\{X \in (t,t+\Delta t]\mid X > t\}\\
&= \frac{P\left(\{X \in (t,t+\Delta t]\}\cap P\{X > t\}\right)}{P\{X > t\}}\\
&= \frac{P\{X \in (t,t+\Delta t]\}}{P\{X > t\}}\\
&= \frac{f_X(t)\Delta t}{1 - F_X(t)}.
\end{align}$$
Consequently,
$$\begin{align}
\int_0^t h(\tau)\,\mathrm d\tau
&= \int_0^t \frac{f_X(\tau)}{1 - F_X(\tau)}\,\mathrm d\tau\\
&= - \ln (1-F_X(\tau))\big|_0^t\\
&= -\ln (1-F_X(t))\\
1-F_X(t) = P\{X > t\}
&= \exp \left(- \int_0^t h(\tau)\,\mathrm d\tau\right)\tag{4}
\end{align}$$
which of course looks a lot like $(1)$, and both integrals
are telling us the probability that there are no arrivals
in $(0,t]$. However, if the first arrival after $0$ occurs
at time $T$, then analysis of the time of the next arrival
is based on $\lambda(t)$ for $t \geq T$ whereas there are no
new arrivals in survival analysis: the system is dead and that's
all there is to it. Now, we can extend the paradigm to say
that the failed system is instantaneously replaced by a brand-new
system that begins operating at time $T$, but the analysis
now begins anew and the hazard rate $\hat{h}(t)$ for the replacement
is $h(t-T)$ etc. In other words, the probability that the
replacement is struck dead in $(T, T+\Delta t]$ is $h(0)\Delta t$,
not $h(T)\Delta t$.
What the OP conjectured, viz.
To me, it seems like the intensity function deals with reoccurring failure, while the hazard function deals only with the time to first failure?
is correct.
Best Answer
In general, the times $T_i$ are not independent and not identically distributed. So questions 1 and 2 are answered in the negative. It is not too difficult to find examples in which they are clearly dependent and/or clearly differently distributed. I leave such a demonstration for someone else but shall however try to derive the distribution of $T_1$.
First recall that if $\lambda(t)=\lambda$ is fixed (so in case of the usual homogeneous Poisson process), the probability that there is no point in an interval of length $s$ is $e^{-\lambda s}$. This is because we know that the number of points $N_s$ in an interval of length $s$ is Poisson distributed with parameter $\lambda s$ (constant rate $\times$ interval length): $$ \text{Prob}[N_s = n] = e^{-\lambda s} \frac{(\lambda s)^n}{n!} $$ so $$ \text{Prob}[N_s = 0] = e^{-\lambda s} $$
Secondly, consider the probability that no point occurs in the interval $[0,s]$ for the nonhomogeneous process. To obtain this probability, we are going to cut the interval in equal parts of length $\Delta$ and then let $\Delta\to 0$. Every part then becomes very (infinitesimally) small so that we can assume the rate $\lambda(t)$ is constant in that small part. If the rate is constant there (say equal to $\lambda$), the number of points in that part is Poisson distributed with parameter $\lambda \Delta$. So we have: \begin{align*} \text{Prob}\big[\text{no point in }[0,s]\big] & = \lim_{\Delta\to 0}\text{Prob}\big[\text{no point in }[0,\Delta], \ldots, \text{no point in }[s-\Delta,s]\big]\\ & = \lim_{\Delta\to 0}\prod_{i=0}^{s/\Delta-1}\text{Prob}\big[\text{no point in }[i\Delta,(i+1)\Delta]\big]\\ & = \lim_{\Delta\to 0}\prod_{i=0}^{s/\Delta-1} e^{-\lambda(i\Delta)\Delta}\\ & = \lim_{\Delta\to 0} \exp \Big[ - \sum_{i=0}^{s/\Delta-1} \lambda(i\Delta)\Delta) \Big]\\ & = \exp \Big[ - \lim_{\Delta\to 0} \sum_{i=0}^{s/\Delta-1} \lambda(i\Delta)\Delta \Big] = \exp \Big[ - \int_0^s \lambda(t) \text{d}t \Big] = e^{-\Lambda(0,s)} \end{align*} because the last limit is nothing but a Riemann sum that is the area under the graph of $\lambda(t)$ from $t=0$ to $t=s$. For convenience, let us call $\Lambda(t_1,t_2) =\int_{t_1}^{t_2} \lambda(t)\text{d}t$.
Finally, consider the probability density $f(t)$ function of $T_1$: \begin{align*} f(t)\text{d}t & = \text{Prob}[ t \leqslant T_1 < t+\text{d}t ]\\ & = \text{Prob}\big[\text{no point in }[0,t]\big] \cdot \text{Prob}\big[\text{1 point in }[t,t+\text{d}t[\big]\\ & = e^{-\Lambda(0,t)} \cdot e^{-\lambda(t)\text{d}t}\lambda(t)\text{d}t = e^{-\Lambda(0,t)} \lambda(t)\text{d}t \end{align*} so the density of $T_1$ is $$ f(t) = e^{-\Lambda(0,t)} \lambda(t) $$ and the distribution function $F(t)=\text{Prob}[T\leqslant t]$ follows by integration of $f(t)$ as $$ F(t) = 1- e^{-\Lambda(0,t)} $$