Solved – How to make predictions with the posterior predictive distribution

generative-modelslikelihoodposteriorpredictive-modelsvariational-bayes

In the paper Deep Survival Analysis (Ranganath, 2016), the author specifies a generative model for deep survival analysis.

Say $\mathbf{x}$ denotes the set of covariates, $\mathbf{\beta}$ the parameters for the data with some prior $p(\mathbf{\beta})$, $k$ a fixed scalar, and $n$ be the index of an observation. The generative model is defined as

\begin{gather*}
b \sim \mathcal{N}(0, \sigma_b)\\
a \sim \mathcal{N}(0, \sigma_W)\\
z_n \sim \text{DEF}(\mathbf{W})\\
\mathbf{x}_n \sim p(\cdot | \mathbf{\beta}, z_n)\\
t_n \sim Weibull(log(1+\exp(z_n^Ta+b)),k)
\end{gather*}

The latent variable $z_i$ comes from a DEF which generates the observed covariates and the time to failure. Given covariates $\mathbf{x}$, the model makes predictions via the posterior predictive distribution:

\begin{gather*}
p(t|\mathbf{x}) = \displaystyle\int_zp(t|z)p(z|\mathbf{x})dz.
\end{gather*}

My question now is the following:

  1. How can this specific posterior predictive distribution make these predictions computationally? Do you first generate your most probable latent variables $z_n$ given your data $\mathbf{x}$ and do you then try to find the most probable $t$, given your most probable latent variables $z_n$?

Best Answer

"Makes predictions" can mean a lot of things.

  • You can make a point prediction, i.e., a single number summary.
  • Or a point prediction plus some indication of uncertainty or variability.
  • Or one or multiple quantile forecasts, e.g., to create a fan plot
  • Or finally a full predictive density.

Of course, from each of these you can extract the ones mentioned before.

In your present case, I'd be surprised if there was some analytic way of calculating (any kind of) predictions. So I assume that the author simulated the entire model many times, which would give him a simulated posterior predictive density. This can then be used to extract quantiles, or the posterior mean forecast by taking the expectation.

Incidentally, I doubt one will be interested in the most probable posterior value, i.e. a maximum a posteriori (MAP) prediction, or the mode of the posterior predictive density. The mean of the density is a much more common point prediction. Of course, it all depends on what loss function you are trying to minimize with a one-number summary: the posterior mean will minimize squared loss (in expectation), whereas the posterior median will minimize expected absolute loss.

Related Question