Using a Bin$(k,\theta)$ random sample ($k$ known), determine the posterior distribution for $\theta$ using a Jeffrey’s prior.

bayes-theorembayesianbinomial distributionlog likelihood

My question here is, am I misinterpreting a binomial random sample? Doing some research on this I'm thinking perhaps I have misunderstood something – however my answer "feels" okay regardless? Although I think I've maybe just used a few tricks that may not make sense to get my answer…

This is my interpretation of the question:

$$\begin{align}
&X_i|\theta \sim Bin(k, \theta), \; i = 1,…,n\;, 0\leq \theta \leq 1 \\\\
\implies f_{\underline{X}}(\underline{X}|\theta)
&= \prod_{i=1}^n {k \choose x_{i}} \theta^{x_i}(1 – \theta)^{k-x_{i}}\\
&= \theta^{\sum x_i}(1-\theta)^{nk – \sum x_i} \left(\prod_{i=1}^n {k \choose x_{i}}\right)\\
&\propto \theta^{\sum x_i}(1-\theta)^{k – \sum x_i}
\end{align}$$

(Once the data is observed $\prod_{i=1}^n {k \choose x_{i}}$ is just a constant)

So,

$$\begin{align}
\log{f_{\underline{X}}(\underline{X}|\theta)}
&= \left(\sum_{i=1}^n x_i\right)\log{\theta} + \left(nk – \sum_{i=1}^n x_i\right)\log{(1-\theta)} + \log\left(\prod_{i=1}^n {k \choose x_{i}}\right)\\
\implies \frac{\partial \log{f_{\underline{X}}(\underline{X}|\theta)}}{\partial \theta}
&= \frac{1}{\theta}\left(\sum_{i=1}^n x_i\right) – \frac{1}{1-\theta}\left(nk – \sum_{i=1}^n x_i\right) \\
\implies \frac{\partial^2 \log{f_{\underline{X}}(\underline{X}|\theta)}}{\partial \theta^2}
&= -\frac{1}{\theta^2}\left(\sum_{i=1}^n x_i\right) – \frac{1}{(1-\theta)^2}\left(nk – \sum_{i=1}^n x_i\right) \\ \\
\implies I(\theta)
&= E\left[- \frac{\partial^2 \log{f_{\underline{X}}(\underline{X}|\theta)}}{\partial \theta^2}\right]\\
&= E\left[\frac{1}{\theta^2}\left(\sum_{i=1}^n x_i\right) + \frac{1}{(1-\theta)^2}\left(nk – \sum_{i=1}^n x_i\right)\right]\\
&= \frac{1}{\theta^2}\left(\sum_{i=1}^n E[x_i|\theta]\right) + \frac{1}{(1-\theta)^2}\left(nk – \sum_{i=1}^n E[x_i|\theta]\right)\\
&= \frac{nk}{\theta} + \frac{nk}{(1-\theta)^2} – \frac{nk\theta}{(1-\theta)^2}\\
&= \frac{nk}{\theta(1-\theta)}
\end{align}$$

Hence, the Jeffrey's prior is

$$
\pi(\theta) \propto \sqrt{\frac{nk}{\theta(1-\theta)}} \propto \sqrt{\frac{k}{\theta(1-\theta)}}
$$

So, using Bayes' Theorem:

$$\begin{align}
\pi(\theta | \underline{X})
&\propto \theta^{\sum x_i}(1-\theta)^{k – \sum x_i} \sqrt{\frac{k}{\theta(1-\theta)}}\\
&\propto C\theta^{n\bar{x} – 1/2}(1-\theta)^{k – n\bar{x} – 1/2}
\end{align}$$

Thus, we get a Beta posterior distribution, specifically:

$$
\theta|\underline{X} \sim Beta(n\bar{x} + 1/2, k-n\bar{x} + 1/2)
$$

Best Answer

For a binomial or Bernoulli likelihood on the parameter $\theta$, the conjugate prior is Beta distributed. It is not necessary to normalize the prior to make it a proper density because we only need to retain proportionality: $$f(\theta \mid x) \propto f(x \mid \theta) p(\theta).$$ Any constant with respect to $\theta$ on the right-hand side of the above will not affect the calculation of the posterior.

For example, your Jeffrey's prior you calculated was $$p(\theta) = \sqrt{\frac{k}{\theta(1-\theta)}}$$ which is proportional to a Beta density with prior hyperparameters $\alpha = \beta = 1/2$. For a likelihood of a single binomial observation $$f(x \mid \theta) = \binom{k}{x} \theta^x (1-\theta)^{k-x},$$ this gives us the posterior

$$f(\theta \mid x) \propto \binom{k}{x} \sqrt{k} \theta^{x-1/2} (1-\theta)^{k-x-1/2}.$$

But the factor $\binom{k}{x} \sqrt{k}$ is irrelevant: what matters is the kernel $\theta^{x-1/2} (1-\theta)^{k-x-1/2}$ which is proportional to a Beta density with posterior hyperparameters $\alpha^* = x + 1/2$, $\beta^* = k-x+1/2$. Had we instead used $$p(\theta) = \frac{1}{\sqrt{\theta(1-\theta)}}$$ as our Jeffrey's prior, the result would be the same for the posterior because we ignored the $\sqrt{k}$ factor anyway. Or we could have even used the proper Beta Jeffrey's prior density $$f(\theta) = \frac{1}{\pi \sqrt{\theta(1-\theta)}}.$$ It makes no difference.

The above easily generalizes to a likelihood on a sample of $n$ binomial observations:

$$f(\theta \mid x_1, \ldots, x_n) \propto \theta^{n \bar x - 1/2}(1-\theta)^{n(k - \bar x) - 1/2}.$$ Note you have an error in your posterior; this is because for a sample of size $n$, the sample total $\sum x_i = n \bar x$ is binomial with parameters $nk$ and $\theta$, so the likelihood is proportional to that for a single observation of a binomial variable with those parameters; i.e., $n\bar x \in \{0, 1, \ldots, nk\}$.