Negative Binomial likelihood and Beta prior

bayesian

I'm trying to settle what the posterior is (or more specifically, the parameters for the posterior) when we have a likelihood function that is coming from a Negative Binomial distribution and that we are assuming that the prior is beta [since it's a conjugate prior to the Negative Binomial distribution].

From this thread it is said that the posterior follows a Beta distribution with the parameters (same as Wikipedia) $$ \alpha^* = \alpha + \sum x_i, \qquad \beta^* = \beta + nk$$

From this thread it follows a Beta distribution with parameters $$ \alpha^* = \alpha + nr, \qquad \beta^* = \beta + \sum x_j – nr$$

Also, here is a third different example, with different parameters.

Do anyone have a explanation why it's differnet and how I should think?

Best Answer

Yes, the explanation is that it all depends on the parametrization of the negative binomial PMF.

For consistency, I will choose the parametrization in the second link, namely $$\Pr[X = x \mid r, p] = \binom{x - 1}{r - 1} p^r (1-p)^{x-r}, \quad x \in \{r, r+1, r+2, \ldots \}.$$ $X$ represents the random number of trials needed to observe the $r^{\rm th}$ success in a sequence of independent and identically distributed Bernoulli trials with probability of success $p$. If $p$ is itself Beta distributed with hyperparameters $a, b$, then the kernel of the likelihood is $$\Pr[X = x \mid r, p]f(p) \propto p^r (1-p)^{x-r} p^{a-1} (1-p)^{b-1} = p^{r+a-1} (1-p)^{x-r+b-1},$$ which is the kernel of a beta density with posterior hyperparameters $$a^* = r+a, \quad b^* = x-r+b.$$ This is for a single observation from $X$; if we have a sample $\boldsymbol x = (x_1, \ldots, x_n)$, then it is easy to see that our posterior kernel takes the form $$\Pr[\boldsymbol X = \boldsymbol x \mid r, p]f(p) \propto p^{nr + a - 1} (1-p)^{\sum x - nr + b - 1},$$ hence the posterior is beta with hyperparameters $$a^* = nr + a, \quad b^* = n(\bar x - r) + b.$$ For convenience I have written the sample total $\sum x$ as $n \bar x$. This result is consistent with the answer in the second link.


Now, say we compare this with the answer in the first link. The difference is that in the first link, $X$ counts the number of successes needed to observe the $r^{\rm th}$ failure ; i.e., the parametrization is $$\Pr[X = x \mid r, p] = \binom{x+r-1}{r-1}(1-p)^r p^x, \quad x \in \{0, 1, 2, \ldots \}.$$ As before, $p$ represents the success probability for the Bernoulli trials. Then the posterior kernel is $$\Pr[\boldsymbol X = \boldsymbol x \mid r, p]f(p) \propto p^{\sum x} (1-p)^{nr} p^{a-1} (1-p)^{b-1} = p^{n \bar x + a - 1} (1-p)^{nr + b - 1}$$ and the posterior hyperparameters are $$a^* = n\bar x + a, \quad b^* = nr + b,$$ which is the result in the first link.

Since you did not point to a specific formula in the third link, I will not elaborate further, but as you can see from the first two links, the posterior parametrization depends on the way the observed variable is parametrized. Whether you count trials or successes or failures, and whether you let the beta density model success or failure, all influence the parametrization.

Related Question