Bayesian – Simultaneous Bayes Estimation Detailed Guide

bayesianbernoulli-distributionfinite-mixture-modelmixture-distribution

Given $\theta_i$, $0 < \theta_i < 1$, a sequence of independent Bernoulli ($\theta_i$) random variables from i subpopulations, that are also independent across subpopulations. Suppose i=2 (2 distinct subpopulations in the population, $\pi_1$ and $\pi_2$ are subpopulation proportions with $\sum \pi_i = 1$), how do we go about simultaneous estimation of $\theta_1$ and $\theta_2$ with the following properties –

  1. $\theta_1$ and $\theta_2$ have a beta prior distribution
  2. $\pi_1$ and $\pi_2$ have Dirichlet prior distribution
  3. $\pi$ is unknown, $\theta$ and $\pi$ are independent
  4. Estimation loss is sum of component losses and component loss is squared error loss,
    $L(\theta_i, \hat \theta_i) = (\theta_i, \hat \theta_i)^2$

How do we find the Bayes estimator of $\theta$ = ($\theta_1$, $\theta_2$) and its posterior expected loss?

Any solved example/direction/text/links would be helpful!

Update:

When generalizing the number of subpopulations from 2 to say i=1…..I (I subpopulations) and using a “scaled” squared error loss function like this one: L(θi, θ ̂i) = (θi – θ ̂i)2 / θi (1- θi), with everything else (conditions 1-3 above) remaining the same: (θ1 … θI) ~ independent Beta (ɑi0, σi0 – ɑi0) prior distribution and if π (mixing proportions) is unknown, πi ~ Dirichlet (ai0) with ai0 > 0, and with the constraint ∑ πi=1. The proportions are treated as nuisance parameters.

The posterior distributions of θ given the data are independent Beta (ɑin, σin – ɑin) distributions, where ɑin = ɑi0 + Xi.n and σin = σi0 + Xi.n. If π (mixing proportions) is unknown, then given the data, π is independent of θ and has a Dirichlet (a1n, …., aIn) posterior distribution where ain = ai0 + Xi.n where Xi.n = Xi1n + Xi2n, the number of observations (out of total n) from the ith subpopulation.

Considering the scaled error loss function given above and the posterior distribution for θ, it’s easy to calculate the Bayes estimator of θ given the data as =
E [ w(θ) * θin | Data ] / E [ w(θ) | Data], where w(θ) = 1/ θi (1- θi), the scaling factor. After all calculations, the Bayes estimator of θ given the data is θ ̂n = (θ ̂1n … θ ̂In) where
θ ̂in = (ɑin – 1) / (σin – 2)

But, I am having difficulty calculating the posterior expected loss using θ ̂n given the data, which is supposedly ∑ 1 / (σin – 2) for i=1…..I

Is there an easy way to calculate the P.E.L for the scaled error loss function? I am using the following expression, but I get a different answer than the one above:

E [ w(θ) * θ2 | Data ] – ( E2 [ w(θ) * θ | Data ] / E [ w(θ) | Data] ) giving

∑ [ ( ɑin / σin – ɑin – 1) – (σin – 1)(ɑin – 1)/(σin – ɑin – 1)(σin – 2) ] for i=1…..I

Is the above expression for calculating P.E.L incorrect?

Best Answer

What you are describing is a mixture of Bernoulli random variables. First, let's start with a minor correction, if you have two sub-populations, than for the $\pi_i$ mixing proportions you don't need Dirichlet distribution, just use the beta distribution so the proportions are

$$ \pi \sim \mathsf{Beta}(\alpha_\pi, \beta_\pi) $$

where the mixing proportions for the subgroups are $\pi$ and $1 - \pi$ respectively. In such a case, your model is

$$\begin{align} \theta_i &\sim \mathsf{Beta}(\alpha_{\theta_i}, \beta_{\theta_i}) \\ \pi &\sim \mathsf{Beta}(\alpha_\pi, \beta_\pi) \\ y_i &\sim \pi \; \mathsf{Bern}(\theta_1) + (1 - \pi) \; \mathsf{Bern}(\theta_2)\\ \end{align}$$

As with other mixtures, it does not have a closed-form solution. The usual approach would be to fit it using maximum likelihood, or Bayesian estimation. To estimate the parameters, you either need to use the E-M algorithm (see e.g. those slides), or MCMC sampling in the Bayesian approach (e.g. this paper). If you decide to use Bayesian approach, beware of the label switching problem, that would usually need some special precautions. After you found the posterior distribution, if you are interested in the point estimate that minimizes the squared error, just take the posterior mean as it minimizes the squared error.

Related Question