Basically what is being done is they are modeling observed survival as a Gumbel distribution and censoring as a Gumbel survival function... I'm thinking the censored values are contributing nothing to this model and the fitted survival data is entirely based on a gumbel distribution fit of observed deaths.
That's not correct. A survival model has to incorporate the contributions of censored observations to the likelihood. Otherwise there is substantial risk of bias.
Location-scale survival modeling in this example is based on the form:
$$\log T = \eta + \sigma W ,$$
where $T$ is survival time, $\eta$ is a location parameter (the linear predictor as a function of covariate values), and $\sigma$ is a scale parameter (represented as s
in your code, I think).
$W$ has a probability distribution corresponding to the underlying parametric model. For the Weibull model in your example, $W$ is standard minimum-extreme-value (one type of Gumbel).
The 2 lines of code that you rightly note to be the heart of the example describe the contributions to the likelihood separately for uncensored (first line) and right-censored (second line) survival times. This page summarizes contributions to likelihood for various types of censoring and truncation in survival analysis.
An uncensored observation contributes a factor proportional to the density of the lifetime distribution at its observation time. That's what your first line of code describes.
Provided that censoring isn't informative, a right-censored observation contributes a factor proportional to the survival function of the lifetime distribution at the (censored) observation time. (A right-censored observation provides information that an individual survived at least that long, which is the survival function of the distribution.) That's what your second line of code represents. In this case there is no modeling of censoring times, just an incorporation of likelihood contributions from censored observations into the model.
The Potential
object provides a way to incorporate such likelihoods into the model without having to define an entire DensityDist
. This page provides further information.
In this way both uncensored and right-censored observation times provide their contributions to the likelihood. As there is no modeling of the censoring times per se, your sampling from the posterior is simply based on the probability distribution of uncensored lifetimes. There is no need to sample censored observation times, which in this case are only used to construct the likelihood.
It appears the issue had nothing to do with any of the model related topics discussed above. In fact it was a simple bug. Specifically I had overwritten the T from the line:
from theano import tensor as T
The moral of the story (at least for me), is that if you get a "SamplingError: Initial evaluation of model at starting point failed" message, don't think about sampling or models or starting points, look for straight up bugs.
Best Answer
EdM's answer on 4/8 is the right one. For Weibull survival models AFT and PH are the same. Where I was confusing myself was thinking that a Weibull model was the same as a CoxPH model where the piecewise constant hazard of the CoxPH model is replaced by a smooth parametric Weibull fit of the same data. Those are different things.