Solved – Seeking a closed form for a posterior distribution

bayesiandistributionshierarchical-bayesianposterior

In the book Bayesian Data Analysis by Gelman et al. (3rd edition, 2014), a hierarchical model (or one-way random-effects ANOVA) is presented in section 5.4 as follows,

\begin{equation}\label{eq:lme1}
y_{ij} = b_0 + \lambda_i + \varepsilon_{ij},
\end{equation}

where the data $y_{ij}$ come from the $i$th measuring entity (e.g., student performance in a school district) collected under the $j$th condition (e.g., a school within the district), $b_0$ is the population mean, $\lambda_i$ is the deviation of the $i$th measuring entity from the population mean, and $\varepsilon_{ij}$ is the measuring error ($i=1, 2, \ldots, k;\ j=1, 2, \ldots, n$).

A posterior inference is derived in the book for the effect of each measuring entity $\theta_i=b_0 + \lambda_i$ based on a Gaussian assumption with a known variance $\sigma^2$ for the residuals $\varepsilon_{ij}$ and a prior distribution $G(0, \tau^2)$ for $\lambda_i$. Specifically, the mean and variance for $\theta_i$ are estimated as below:

\begin{align}
{\rm mean}(\theta_i) &= \frac{\frac{n}{\sigma^2}\bar{y}_{i\cdot}+\frac{1}{\tau^2}b_0}{\frac{n}{\sigma^2}+\frac{1}{\tau^2}} \\[7pt]
{\rm Var}(\theta_i) &= \frac{1}{\frac{n}{\sigma^2}+\frac{1}{\tau^2}}
\end{align}

where $\bar{y}_{i\cdot}=\frac{1}{n}\sum_{j=1}^n y_{ij}$.

Even though the variance for the $\lambda_i$ is assumed to be known, I could solve the model as a mixed-effects model through, for example, function lmer() in the R package lme4, and use the estimated variances $\tau^2$ and $\sigma^2$ to obtain the posterior distribution using the formulation above. Is this a reasonable and solid approach?

I know that I could directly obtain the posterior distribution through R packages such as brms and rstanarm. However, the computational cost is too heavy in my case, and that's why I'm trying to see if the above closed form is a a reasonable approach to directly obtaining the posterior distribution by plugging the variance estimate $\hat{\sigma}^2$ from lmer, rather than going through the typical Bayesian route..

Best Answer

The general concept here is updating a conjugate prior.

I'm trying to see if the above closed form is a feasible solution

I'm not sure what you mean by feasible. For the Gaussian distribution $N\left(\mu, \sigma^2\right)$, any specification of mean $\mu$ and variance $\sigma^2$ - with mean parameter unrestricted and variance parameter strictly positive $\sigma^2 > 0$ - defines a unique and valid Gaussian distribution.

I will assume you are asking if the proposed closed form solution is correct.

The closed form solution you provide is the correct posterior distribution given a Gaussian prior distribution $b_0+\lambda_i = \theta_i \sim N\left(b_0, \tau_0^2\right)$ and observed data with known observation variance $\sigma^2$.

Precision $K$ is the inverse of variance $\sigma^2$: $K = \sigma^{-2}$.

Considering the mean of the posterior distribution for $\theta_i$, the numerator in the formula $mean(\theta_i)$ you provide is the precision weighted combination of the prior distribution's mean $b_0$ and the observed data's mean $\bar{y_i}$. The denominator of the posterior mean formula simply scales the sum of these weights to one.

Considering the variance of the posterior distribution, the operation reflected in the formula $Var(\theta_0)$ you provide is the summing of the precision of the prior distribution and the precision of the observations; then inverting to obtain variance.

In summary, the formulas you provide for posterior mean and posterior variance are correct given the prior $\theta_i \sim N(b_0,\tau^2)$, and given $n$ observations with known observation variance $\sigma^2$.


If your concern has to do with specifying the prior distribution $N(b_0,\tau^2)$ parameters, note the influence of the prior mean $b_0$ in your formula for the posterior mean $mean(\theta_i)$ diminishes as the number of observations increases, and diminishes as the prior variance increases. Therefore, don't worry too much about the prior mean, just increase the prior variance as seems appropriate; and, increase the number of observations!

Related Question