[Math] Computing posterior distribution for AR(1) model

bayesianprobability distributions

Question: For this question, note that the notation $y_{1:T} = (y_1, y_2, \cdots, y_T)$, ie, a vector of random variables.

Consider the following AR(1) model:
\begin{align*}
y_{t+1} = \phi y_t + \sigma \eta_t \ \ \cdots (a)
\end{align*}
for $t = 1, 2, \cdots, T$ where
\begin{align*}
\eta_t \stackrel{iid}{\sim} N(0,1) \ \ \cdots (b)
\end{align*}
with $\eta_1$ independent of $y_k$ for $k \le t$, and where
\begin{align*}
y_1 \sim N\left(0, \frac{\sigma^2}{1-\phi^2}\right) \ \ \cdots (c)
\end{align*}
Define $\theta = (\phi, \sigma)$ and consider a prior distribution given by:
\begin{align*}
p(\theta) \propto \frac{1}{\sigma} \ \text{for} \ \infty < \phi < \infty \ \text{and} \ 0<\sigma<\infty \ \ \cdots (1)
\end{align*}
Also define the conditional likelihood function to be:
\begin{align*}
p(y_{2:T} \mid y_1, \theta) = \left(2\pi \sigma^2 \right)^{-\frac{T-1}{2}}\exp\left[- \frac{1}{2\sigma^2}\sum_{t=2}^T \left(y_t – \phi y_{t-1} \right)^2\right] \ \ \cdots (2)
\end{align*}

Show that the conditional posterior distribution, corresponding to the conditional likelihood function in $(2)$ and the prior density in $(1)$, may be obtained analytically, whereas the full posterior distribution, corresponding to the model in $(a)-(c)$ and the prior density in $(1)$, requires an alternative computational approach.


My Working: So firstly, I worked out the likelihood function corresponding to the AR(1) model described by $(a)-(c)$. My working is as follows:

We are given $y_1 \sim N\left(0, \frac{\sigma^2}{1-\phi^2}\right)$, so the pdf is given by:
\begin{align*}
p(y_1 \mid \theta) = \left(2\pi\left( \frac{\sigma^2}{1-\phi^2}\right) \right)^{-\frac{1}{2}}\exp\left[ – \frac{y_1^2}{2\left(\frac{\sigma^2}{1-\phi^2}\right)} \right]
\end{align*}
Conditionally, we have $y_2 \mid y_1 \sim N\left( \phi y_1, \sigma^2\right)$, so the pdf is given by:
\begin{align*}
p(y_2 \mid y_1, \theta) = \left(2\pi \sigma^2\right)^{-\frac{1}{2}} \exp\left[-\frac{1}{2\sigma^2} \left(y_2 – \phi y_1\right)^2 \right]
\end{align*}
Similarly, $y_3 \mid y_2, y_1 \equiv y_3 \mid y_2 \sim N\left(\phi y_2, \sigma^2\right)$, so the pdf is given by:
\begin{align*}
p(y_3 \mid y_2, y_1, \theta) = \left(2\pi \sigma^2\right)^{-\frac{1}{2}} \exp\left[-\frac{1}{2\sigma^2} \left(y_3 – \phi y_2\right)^2 \right]
\end{align*}
So in general, $y_t \mid y_{t-1}, y_{t-2}, \cdots, y_1 \equiv y_t \mid y_{t-1} \sim N\left(\phi y_{t-1}, \sigma^2\right)$, with pdf:
\begin{align*}
p(y_t \mid y_{t-1}, y_{t-2}, \cdots, y_1, \theta) = \left(2\pi \sigma^2\right)^{-\frac{1}{2}} \exp\left[-\frac{1}{2\sigma^2} \left(y_t – \phi y_{t-1}\right)^2 \right]
\end{align*}
Using the method of composition, we have:
\begin{align*}
p(y_{1:T} \mid \theta) & = p(y_1 \mid \theta)p(y_2 \mid y_1, \theta)p(y_3 \mid y_1, y_2, \theta) \cdots p(y_T \mid y_1, y_2, \cdots, y_{T-1}, \theta) \\
& = p(y_1 \mid \theta) \prod_{t=2}^T p(y_t \mid y_{t-1}, \theta)
\end{align*}
Thus the likelihood function computed for a given value of $\theta = \left(\phi, \sigma^2\right)$, is given by:
\begin{align}
L(\theta) & = \left\{ \left(2\pi\left( \frac{\sigma^2}{1-\phi^2}\right) \right)^{-\frac{1}{2}}\exp\left[ – \frac{y_1^2}{2\left(\frac{\sigma^2}{1-\phi^2}\right)} \right]\right\} \prod_{t=2}^T \left(2\pi \sigma^2\right)^{-\frac{1}{2}} \exp\left[-\frac{1}{2\sigma^2} \left(y_t – \phi y_{t-1}\right)^2 \right] \\
& \propto (1-\phi^2)^{\frac{1}{2}}\sigma^{-T} \exp\left[-\frac{\sum_{t=2}^T\left(y_t – \phi y_{t-1}\right)^2+y_1^2\left(1-\phi^2\right)}{2\sigma^2} \right] \ \ \cdots (W1)
\end{align}

Deriving the conditional likelihood given in $(2)$ can be done as follows:
\begin{align*}
p(y_{2:T} \mid y_1, \theta) & = \frac{p(y_{1:T} \mid \theta)}{p(y_1 \mid \theta)} \\
& = \prod_{t=2}^T p(y_t \mid y_{t-1}, \theta) \\
& = \left(2\pi \sigma^2 \right)^{-\frac{T-1}{2}}\exp\left[- \frac{1}{2\sigma^2}\sum_{t=2}^T \left(y_t – \phi y_{t-1} \right)^2 \right]
\end{align*}


My Query: I don't really get what the question is trying to ask me to do? What does it mean by conditional posterior? Full posterior? Any assistance will be appreciated!


EDIT 1 PROGRESS: Okay, so I've played around a bit more and made a bit of progress. I interpret 'conditional posterior' as follows:

Notice that when $T$ is large, then the factor $(1-\phi^2)$ in Eqn. $(W1)$ is small, so we can approximate the full likelihood with the conditional likelihood:
\begin{align*}
p(y_{2:T} \mid y_1, \theta) \propto \sigma^{-(T-1)}\exp\left[- \frac{1}{2\sigma^2}\sum_{t=2}^T \left(y_t – \phi y_{t-1} \right)^2 \right]
\end{align*}
So under the prior $p(\theta) \propto \frac{1}{\sigma}$, we have:
\begin{align*}
p(\theta \mid y_{1:T}) & \propto \sigma^{-T}\exp\left[- \frac{1}{2\sigma^2}\sum_{t=2}^T \left(y_t – \phi y_{t-1} \right)^2 \right] \\
& = \sigma^{-T}\exp\left[-\frac{1}{2\sigma^2}\sum_{t=2}^T \left(y_t^2 -2y_t \phi y_{t-1} + \phi^2 y_{t-1}^2 \right) \right] \\
& = \sigma^{-T}\exp\left[-\frac{1}{2\sigma^2}\left(\underbrace{\sum_{t=2}^Ty_t^2}_{C} – 2\phi\underbrace{\sum_{t=2}^Ty_t y_{t-1}}_{B} + \phi^2 \underbrace{\sum_{t=2}^T y_{t-1}^2}_{A} \right) \right]
\end{align*}
First, note that:
\begin{align*}
p(\phi \mid \sigma, y_{1:T}) & \propto \exp\left[-\frac{1}{2\sigma^2}\left(C-2\phi B + \phi^2A \right) \right] \\
& = \exp\left[-\frac{A}{2\sigma^2} \left(\phi^2 – 2\phi \frac{B}{A} + \frac{C}{A} \right) \right] \\
& = \exp\left[-\frac{A}{2\sigma^2}\left(\left(\phi – \frac{B}{A} \right)^2-\left(\frac{B}{A}\right)^2 + \frac{C}{A} \right) \right] \\
& \propto \exp\left[-\frac{A}{2\sigma^2}\left(\phi-\frac{B}{A}\right)^2 \right] \\
& = \exp\left[-\frac{1}{2\left(\frac{\sigma^2}{A} \right)}\left(\phi-\frac{B}{A}\right)^2 \right]
\end{align*}
So the distribution of $\phi \mid \sigma, y_{1:T}$ is given by:
\begin{gather}
\phi \mid \sigma, y_{1:T} \sim N\left(\frac{B}{A}, \frac{\sigma^2}{A} \right) \\
\implies \phi \mid \sigma, y_{1:T} \sim N\left(\frac{\sum_{t=2}^T y_t y_{t-1}}{\sum_{t=2}^T y_{t-1}^2}, \frac{\sigma^2}{\sum_{t=2}^T y_{t-1}^2} \right)
\end{gather}
Next, note that:
\begin{align*}
p(\sigma \mid y_{1:T}) & \propto \int_{\phi} \sigma^{-T} \exp\left[-\frac{1}{2\sigma^2} \sum_{t=2}^T \left(y_t – \phi y_{t-1}\right)^2 \right]d\phi \\
& = \sigma^{-T} \int_{\phi} \exp\left[-\frac{A}{2\sigma^2} \left(\left(\phi – \frac{B}{A} \right)^2 – \left(\frac{B}{A}\right)^2 + \frac{C}{A} \right) \right]d\phi \\
& = \sigma^{-T} \int_{\phi} \exp\left[-\frac{A}{2\sigma^2} \left(\phi – \frac{B}{A} \right)^2 + \left(\frac{A}{2\sigma^2}\right) \left(\frac{B}{A}\right)^2 – \left(\frac{A}{2\sigma^2}\right)\left(\frac{C}{A}\right) \right]d\phi \\
& = \sigma^{-T} \exp\left[\frac{B^2/A – C}{2\sigma^2} \right] \int_{\phi} \exp\left[-\frac{A}{2\sigma^2} \left(\phi – \frac{B}{A} \right)^2\right] d\phi \\
& = \sigma^{-T} \exp\left[\frac{B^2/A – C}{2\sigma^2} \right] \left(2\pi\left(\frac{\sigma^2}{A} \right) \right)^{\frac{1}{2}} \\
& \propto \frac{1}{\sigma^{(T-2)+1}}\exp\left[\frac{B^2/A – C}{2\sigma^2} \right]
\end{align*}
Now define $b = \frac{B}{A}$, notice:
\begin{align*}
Q(y_{2:T}, b) & = \sum_{t=2}^T \left(y_t – b y_{t-1}\right)^2 \\
& = \sum_{t=2}^T \left(y_t^2 – 2y_t b y_{t-1} + b^2 y_{t-1}^2 \right) \\
& = \sum_{t=2}^T y_t^2 – 2b\sum_{t=2}^T y_t y_{t-1} + b^2 \sum_{t=2}^T y_{t-1}^2 \\
& = C – 2bB+b^2A
\end{align*}
Then clearly,
\begin{align*}
-Q(y_{2:T}, b) = B^2/A-C
\end{align*}
So the distribution of $\sigma \mid y_{1:T}$ is given by:
\begin{gather}
\sigma \mid y_{1:T} \sim IG\left(v = T-2, \widehat{\sigma^2} = \frac{1}{T-2} \left(B^2/A – C \right) \right) \\
\implies \sigma \mid y_{1:T} \sim IG\left(v = T-2, \widehat{\sigma}^2 = \frac{1}{T-2} \left(\frac{\left(\sum_{t=2}^T y_t y_{t-1} \right)^2}{\sum_{t=2}^T y_{t-1}^2} – \sum_{t=2}^T y_t^2 \right) \right)
\end{gather}
Or equivalently:
\begin{gather*}
\sigma \mid y_{1:T} \sim IG\left(v = T-2, \widehat{\sigma^2} = -\frac{1}{T-2}Q(y_{2:T}, b) \right)
\end{gather*}
Thus, we can derive the conditional posterior distribution analytically as:
\begin{align*}
p(\phi, \sigma \mid y_{1:T}) = p(\phi \mid \sigma, y_{1:T})p(\sigma \mid y_{1:T})
\end{align*}
where $p(\phi \mid \sigma, y_{1:T})$ and $p(\sigma \mid y_{1:T})$ are derived above.


However, if we use the full likelihood (which is what the second part of the question is asking), how can we derive the 'full posterior'? I do not see any obvious ways to find the appropriate integrating constants. I'm assuming I need to use Gibbs/M-H/or some other kind of MCMC sampling scheme?

Best Answer

Not sure how much help this is, but might give you some inspiration. Notice that $p(y_2, y_3, \dots, y_T|y_1, \theta)$ is conditional on the first observation. For maximum likelihood estimation of an AR(1) process, consider the two densities: $$ \begin{align*} p(y_1, y_2, y_3, \dots, y_T; \phi)&=p(y_1; \phi)\prod_{t=2}^T p(y_t|y_{t-1}; \phi)\tag{1}\\ p(y_2, y_3, \dots, y_T|y_1; \phi)&=\prod_{t=2}^T p(y_t|y_{t-1}; \phi)\tag{2} \end{align*} $$ The first one requires a numerical solution, wheres the second one can actually be solved analytically. My guess is that 'conditional posterior' refers to (a Bayesian version of) (1) and that 'full posterior' refers to (a Bayesian version of) (2).

Related Question