Bayesian – How to Interpret Inverse Predictive Posterior in Markov Chain Monte Carlo

bayesianinverse-predictionmarkov-chain-montecarloposterior

Suppose I have a parametric nonlinear model, say
$$
y_i |\theta \sim N(f_{\theta}(x_i), \sigma^2)
$$

with known form of $f_\theta$. We get data $d=(y_i,x_i)_{i=1,\ldots,n}$ and obtain posterior samples so we can make inference on $\theta|d$. This data is collected in such a way that $x$ is not random, but rather fixed by experimental design.

Now suppose I get new data $y^{(new)}$ but no $x^{(new)}$. Is there someway I can utilize my previous Bayesian model to make posterior inference about $x^{(new)}$?

EDIT: apparently in frequentist world this is called inverse prediction and the initial dataset is called a calibration dataset.

EDIT2: What would the consequences be if I were to fit the inverse model
$$
x_i|\theta \sim N(f^{-1}_\theta(y_i),\sigma_x^2)
$$

Note that $x_i$ in the calibration set are fixed and measured without error, so this seems strange to me and probably problematic in some way. I then could get my answer by using the predictive posterior $p(x^{(new)}|x)$. In retrospect this approach seems nonsensical.

EDIT3: My particular model can also be formulated as $y_i = f_\theta(x_i) + \epsilon_i$ where $\epsilon_i \sim N(0,\sigma^2)$. I could use my posterior samples to draw from this residual $e_j \sim \epsilon|d$ and then $f^{-1}_\theta(y^{(new)} – e_j)$ might be samples from the distribution I am interested in. Thoughts?

Best Answer

Denoting all the conditioning explicitly (which you should make a habit of doing in Bayesian analysis), your nonlinear regression model is actually specifying:

$$p(y_i | x_i, \theta, \sigma) = \text{N}(y_i | f_\theta(x_i), \sigma^2).$$

Now, if you want to make a Bayesian inference about any of the values in the conditional part, you are going to need to specify a prior for them. Fundamentally this is no different from any situation in Bayesian analysis; if you want a posterior for the regressors then your model must specify an appropriate prior. I'm going to assume that you will want to model the regressors using a parametric model with an additional parameter vector $\lambda$. In this case, it is useful to decompose the prior for these three conditioning variables in a hierarchical manner as:

$$\begin{align} \text{Prior for model parameters} & & & \pi(\theta, \sigma, \lambda) \\[6pt] \text{Sampling distribution for regressors} & & & \phi(x_i | \theta, \sigma, \lambda) \end{align}$$

I'm also going to assume that the regressors are IID conditional on the model parameters, so that $p(\mathbf{x}| \theta, \sigma, \lambda) = \prod \phi(x_i | \theta, \sigma, \lambda)$. If you specify this sampling distribution for the regressors then you will get the posterior distribution:

$$\begin{align} \phi(\mathbf{x} | \mathbf{y}) &\overset{\mathbf{x}}{\propto} p(\mathbf{x}, \mathbf{y}, \theta, \sigma, \lambda) \\[12pt] &= \pi(\theta, \sigma, \lambda) \prod_{i=1}^n p(y_i | x_i, \theta, \sigma) \cdot \phi(x_i | \theta, \sigma, \lambda) \\[6pt] &= \pi(\theta, \sigma, \lambda) \prod_{i=1}^n \text{N}(y_i | f_\theta(x_i), \sigma^2) \cdot \phi(x_i | \theta, \sigma, \lambda). \\[6pt] \end{align}$$

Computing the last line of this formula will give you the posterior kernel, and then you can get the posterior distribution by computing the constant for the density directly, or by using MCMC simulation.

Related Question