Bayesian Statistics – How to Obtain Bayes Estimator with Conjugate Prior

bayesianbeta distributionconjugate-priorself-study

Consider n observations $ X_1, X_2,….X_n $ from $ Beta_1 ~ B(1,\theta ) $ distribution.

Obtain Bayes estimator for $ \theta $ under quadratic loss function when conjugate prior is assumed for $\theta $

My Doubts

So here prior distribution is conjugate , it means it will also be from family of beta 1 distribution so $ g(\theta ) $ = $ \frac{\theta^\alpha (1- \theta)^\beta }{\ B(\alpha, \beta )}$

So with the help of hint

$ g(\theta ) $ = $ \alpha \theta^{1-\alpha } $

Now the joint distribution is

$f^*(x,\theta ) $= $f(x|\theta ) \times g(\theta ) $ = $\theta \alpha \theta^{1-\alpha } x^{1-\theta } $

Now after this we will integrate $f^*(x,\theta ) $ to get the marginal distribution of $ x $ = $h(x) $
$h(x) $= $\int_0 ^1 \theta \alpha \theta^{1-\alpha } x^{1-\theta }$

$h(x) $= $\alpha x \int_0 ^1 \theta^{2-\alpha } x^{-\theta }$

Can anyone please tell the integration part , I am unable to figure it out ??

Best Answer

So here prior distribution is conjugate , it means it will also be from family of beta 1 distribution so

The conjugate distribution means that the prior and the posterior are the same distribution. This does not have to have to be the beta distribution.

Bayes theorem

Start with Bayes theorem formulated for continuous functions.

$$f_{posterior}(\theta \vert x) = \frac{f_{likelihood}(x\vert\theta) \cdot f_{prior}(\theta)}{f_{normalization}(x)}$$

  • likelihood: The probability(density) of the observations $x$ as function of values of $\theta$.

    You know that the distribution of the $x_i$ is a $Beta(1,\theta)$ distribution with density function $$f(x_i \vert \theta) = \theta(1-x_i)^{\theta-1}$$ and for the entire sample $$f(x_1, x_2, \dots , x_n \vert \theta) = \theta^n\left( \prod_{i=1}^n (1-x_i) \right)^{\theta-1}= \theta^n(GM^n)^{\theta-1}$$ where $GM = \prod (1-x_i)^{1/n}$ is the geometric mean of the terms $(1-x_i)$

  • normalization: The prior probability of the observations, $f_{normalization}(x)$.

    This is effectively a normalization constant. It is independent of $\theta$ and we can ignore it when we write $$f_{posterior}(\theta \vert x) \propto f_{likelihood}(x\vert\theta) \cdot f_{prior}(\theta)$$ where $\propto$ means 'proportional to'.

In this way, you can see Bayes theorem without having to worry about constants. What we need to know is how the posterior looks like in terms of a function of $\theta$ and we can worry about the constant term (independent of $\theta$) later.

Finding the conjugate distribution

So to look for the conjugate distribution you are looking for a function that remains in the same family after the multiplication with the likelihood

$$f_{posterior}(\theta \vert x) \propto \theta^n(GM^n)^{\theta-1} \cdot f_{prior}(\theta)$$

Now this is not a straightforward technique to get the conjugate distribution, but what I do is imagine a function whose form remains unchanged after multiplication with $\theta^n(GM^n)^{\theta-1}$. I let my mind pass all sorts of forms, polynomials, exponentials, powers... powers?

  • If we have a power law like $\theta^a$ and multiply with $\theta^n$ then we have again a lower law, but with a different exponent, namely: $$\theta^n\theta^a = \theta^{n+a} = \theta^{a^\prime}$$ where $a^\prime = a + n$.
  • If we have a power law like $b^{\theta-1}$ and multiply with $(GM^n)^{\theta-1}$ then we have again a power law, but a different base, namely: $$(GM^n)^{\theta-1} \cdot b^{\theta-1} = (GM^n\cdot b)^{\theta-1} = {b^\prime}^{\theta-1}$$ where $b^\prime = GM^n\cdot b$

So, the conjugate distribution needs to be of the form $$f(\theta) \propto \theta^a \cdot b^{\theta-1} = \frac{1}{b} \theta^a \cdot b^{\theta}$$

This $\theta^a \cdot b^{\theta}$ looks familiar, it is the gamma distribution.

Thus, the conjugate distribution is the gamma distribution.

The coefficients of the posterior can be expressed in terms of the coefficients of the prior, by the previously described $a^\prime = a + n$ and $b^\prime = GM^n\cdot b$. I will let you figure out the change of the constant yourselves.

Possibly it is better to use the gamma distribution not parameterized by $a$ and $b$ as above, but instead, redo the above work with $f(\theta) = \frac{\beta^\alpha}{\Gamma(\alpha)} \theta^{\alpha-1} e^{-\beta \theta} \propto \theta^{\alpha-1} e^{-\beta \theta}$

The difference is in $e^{-\beta \theta}$ and $b^{\theta}$ which is the same if you set $e^{-\beta} = b$.

Related Question