Solved – In Bayesian analysis, how to sample from full conditional given uniform prior and normal data likelihood

bayesiangibbsmarkov-chain-montecarlometropolis-hastings

[EDIT]

This question comes from the example of OpenBUGS manual:

Stagnant: a changepoint problem and an illustration of how NOT to do MCMC!

I also asked another question regarding this example.

[END EDIT]

In Bayesian analysis, assume a simple linear regression model with two straight lines that meet at a certain changepoint $c$. The basic setup is as following.
\begin{align*}
Y_i \ & \sim \ N(\alpha + \beta_1 (x_i – c), \sigma^2), \; \text{for } x_i \leq c \; (i = 1, \ldots, k) \\
Y_i \ & \sim \ N(\alpha + \beta_2 (x_i – c), \sigma^2), \; \text{for } x_i > c \; (i = k+1, \ldots, n) \\
\end{align*}
That is, observed $x$'s are ordered from smallest to largest.

The priors are assumed as:
\begin{align*}
\alpha \ & \sim \ N(\mu_{\alpha}, \sigma^2_{\alpha}), \quad \sigma^2 \ \sim \ IG(a, b) \\
\beta_1, \beta_2 \ & \sim \ N(\mu_{\beta}, \sigma^2_{\beta}), \quad c \ \sim \ Unif(c_1, c_2)
\end{align*}
where $c_1, c_2$ are within the observed range of $x$'s.

The data likelihood is then:
\begin{align*}
& p(\mathbf{x}, \mathbf{y}| c,\alpha, \beta_1, \beta_2, \sigma^2) = \prod_{i=1}^{k} p_1(y_i| c, .) \prod_{i=k+1}^{n} p_2(y_i|c,.) \\
& = (2 \pi \sigma^2)^{-n /2} \exp\left\{- \frac{1}{2 \sigma^2} \sum_{i=1}^k (y_i – \alpha – \beta_1 (x_i – c)) ^ 2 \right\} \\
& \quad \times \exp\left\{- \frac{1}{2 \sigma^2} \sum_{i=k+1}^n (y_i – \alpha – \beta_2 (x_i – c)) ^ 2 \right\}
\end{align*}

And the prior for $c$ is $$
c \ \sim \ Unif(c_1, c_2)
$$

My question is, how do I sample from the posterior distribution of $c$ in MCMC?

I can get the full conditional for $c$ as \begin{align*}
& p(c|.) \propto p(\mathcal{x}, \mathcal{y}| .) p(c) \\
& \propto \exp\left\{-\frac{1}{2\sigma^2} \Big(\beta_1^2 \sum_1^k c^2 + 2c \sum_1^k \beta_1 (y_i – \alpha – \beta_1 x_i) \Big) \right\} \\
& \quad \times \exp\left\{-\frac{1}{2\sigma^2} \Big(\beta_2^2 \sum_{k+1}^n c^2 + 2 c \sum_{k+1}^n \beta_2 (y_i – \alpha – \beta_2 x_i) \Big) \right\} \times \mathbf{I}(c_1, c_2)\\
& \propto \exp\left\{-\frac{1}{2\sigma^2} \Big(c^2 (k\beta_1^2 + (n-k)\beta_2^2) + 2 c \Delta \Big) \right\} \times \mathbf{I}(c_1, c_2) \\
& \ \sim \ N(\mu, \Sigma) \times \mathbf{I}(c_1, c_2)
\end{align*}
where $\Sigma = \frac{\sigma^2}{k\beta_1^2 + (n-k)\beta_2^2}$ and $\mu = \frac{- \Delta}{k\beta_1^2 + (n-k)\beta_2^2}$ with $$
\Delta = \beta_1 \sum_1^k (y_i – \alpha – \beta_1 x_i) + \beta_2 \sum_{k+1}^n (y_i – \alpha – \beta_2 x_i)
$$

Is the full conditional the normal distribution truncated within interval $(c_1, c_2)$? (I think that's not the case though). Could I keep doing Gibbs sampling for $c$ from $N(\mu, \Sigma)$ until it falls in the range and then accept? If not, how could I deal with it?

I thought of Metropolis-Hastings sampling, but that's mostly for sampling unconstrained parameter. So I thought it won't work here.

Any solutions? Thanks very much.

Best Answer

I've figured out this problem and I thought it would be helpful to anyone who is interested in this question if I explain it a little bit here.

First and foremost, the full conditional for $p(c|.)$ derived in the Question part is not correct!!!

In my question, I derived the full conditional as following. \begin{align*} & p(c|.) \propto p(\mathcal{x}, \mathcal{y}| .) p(c) \\ & \propto \exp\left\{-\frac{1}{2\sigma^2} \Big(\beta_1^2 \sum_1^k c^2 + 2c \sum_1^k \beta_1 (y_i - \alpha - \beta_1 x_i) \Big) \right\} \\ & \quad \times \exp\left\{-\frac{1}{2\sigma^2} \Big(\beta_2^2 \sum_{k+1}^n c^2 + 2 c \sum_{k+1}^n \beta_2 (y_i - \alpha - \beta_2 x_i) \Big) \right\} \times \mathbf{I}(c_1, c_2)\\ & \propto \exp\left\{-\frac{1}{2\sigma^2} \Big(c^2 (k\beta_1^2 + (n-k)\beta_2^2) + 2 c \Delta \Big) \right\} \times \mathbf{I}(c_1, c_2) \\ \end{align*} where $ \Delta = \beta_1 \sum_1^k (y_i - \alpha - \beta_1 x_i) + \beta_2 \sum_{k+1}^n (y_i - \alpha - \beta_2 x_i) $.

But the exponential part cannot be recognized as a normal kernel anymore - as what I did in Question part - because $k$ now is a function of $c$, as pointed out by @Xi'an. Think about the sampling process. Each time $c$ is sampled, $k$ needs to be updated as the number of $\{k: x_k \leq c\}$ by definition. Therefore, the exponential part actually consists of several piecewise exponential curves. The discontinuous points occur at the observed $x$ values.

Nevertheless, we can still sample $c$ from this full conditional via slice sampling or rejection sampling. I've implemented the whole sampling procedure for all parameters. All other parameters (i.e, $\alpha, \beta_1, \beta_2, \sigma^2$) were sampled via Gibbs sampling, while $c$ can be sampled via either slice sampling or rejection sampling. For anyone who is interested, you can find my R implementation here. The sampling procedure and the implementation are correct since the code can reproduce results from OpenBUGS.

One issue I still have though is that I tried also using Metropolis-Hastings sampling for $c$ with a uniform proposal, but it's not working at this point. (See the part of code for M-H sampling here. Note that delta and cand.delta in the code is the whole part of the exponent, which is different from my notation here.) I suspect the reason is that I am not sure how to find the right proposal distribution for M-H sampling. I would be very grateful to anyone who can help figure out how to make Metropolis-Hastings sampling work for this problem.

Best Answer

Related Solutions

Bayesian Statistics – Deriving Posterior Distribution from Normal Prior and Likelihood

Posterior Distribution of Normal-Normal-Inverse-Gamma Conjugacy – How to Derive

Related Question