Solved – Updating a beta-binomial

bayesianbeta-binomial distributionprior

Suppose I'm modeling a set of processes using a beta-binomial prior. I can build parameterized beta-binomial models that average over large groups of the processes to give reasonable, although coarse, priors.

$p_i \sim \beta B(n, \alpha_i, \beta_i)$ (roughly)

I know how to update those priors using observed partial data via Bayes' rule. However, for a subset of the priors, I actually have a little more historical data that I'd like to incorporate into the prior, call it $h_j$, where $j \in h$ is a subset of the $i$s. So the result would be an updated distribution, call it $p'_i$. That additional data is a scalar. For example, if I've got a beta-binomial with $n=9$, $\alpha=2$ and $\beta=3$ (see the examples for the dbetabin.ab function in the VGAM R package), it has a mode of 3, but I might have additional prior information that suggests the mode should be closer to 6. I happen to know that this additional information is only modestly predictive ($r$ of .4, say). But it's still better than nothing, and for this particular process, it's known to be a better predictor than the expected value of my existing beta-binomial prior ($r$ of around .3).

So, what I'm looking for, is a way to update the beta-binomial, using this scalar, so that the result is also a beta-binomial, which I can then update like any of my other process models as data comes in. (That is, I need a closed-form expression.) $(\alpha'_i, \beta'_i) = f(\alpha_i, \beta_i, h_i, \theta)$, where $\theta$ has something to do with the relative estimated predictiveness of the original beta-binomial and the scalar $h$.

What's a reasonable approach here? Is there a way to adjust the $\alpha$ and $\beta$ parameters so that the central tendency is pulled an appropriate amount towards my modestly-predictive scalar? I'm happy to use cross-validation or something to identify a weighting parameter, if that's the right way to go about this.

Best Answer

Lets see if I understand Harlan's (and Srikant's) formulation correctly.

$$\pi_1 \sim beta(\alpha_1,\beta_1)$$ $$\pi_2 \sim beta(\alpha_2,\beta_2)$$

Say, $\pi_1$ corresponds to the set of data for which you have less information apriori and $\pi_2$ is for the more precise data set.

Using Srikant's formulation:

$$\pi(p) \propto \pi_1(p) \alpha + \pi_2(p) (1-\alpha)$$

Therefore, the complete hierarchical formulation will be: $$\alpha \sim beta(\alpha_0,\beta_0)$$ $$p|\alpha \sim \pi(p)$$ $$y_i | p \sim B(n_i,p) $$

I assume here that $y_i|p$ are iid. I don't know if this is a valid assumption in your case. Am I correct?

You can choose $\alpha_0$ and $\beta_0$ in such a way that mean of this beta distribution is 0.8 (or 0.2) acc. to your formulation.

Now the MCMC sampling can be done, by using OpenBUGS or JAGS (untested).

I will add more to this (and recheck formulation) as soon as I get more time. But please point out if you see a fallacy in my argument.