You have framed your question very well.
I think what you are looking for here is a case of hierarchical modeling. And you may want to model multiple layers of hierarchy (at the moment you only talk about priors). Having another layer of hyper-priors for the hyper--parameters lets you model the additional variabilities in hyper-parameters (as you are concerned about the variability issues of hyper-parameters). It also makes your modeling flexible and robust (may be slower).
Specifically in your case, you may benefit by having priors for the Dirichlet distribution parameters (Beta is a special case). This post by Gelman talks about how to impose priors on the parameters of Dirichlet distribution. He also cites on of his papers in a journal of toxicology.
The way you've defined them, $A$, $B$, and $C$ are not disjoint events ($A$ and $C$ can happen at the same time, for example), so the equation $P(b)=\sum_{X\in{A,B,C}} P(b|X)$ doesn't work.
It would be nicer to define $A$ as "A is executed", $B$ as "B is executed", and $C$ as "C is executed". This way $A,B,C$ are disjoint and cover the whole probability space $P(A)+P(B)+P(C)=1$.
Assuming that we are looking from A's point of view, we leave $b$ meaning "A finds out that B is pardoned".
So we have $P(A)=P(B)=P(C)=\frac{1}{3}$.
$P(b|A) = \frac{1}{2}$ since the warden could mention either B or C to A.
$P(b|B) = 0$ since if B is executed, the warden won't tell A that B is pardoned.
$P(b|C) = 1$ since if C is executed, the warden has no choice but to tell A that B is pardoned.
$$P(A|b) = \frac{P(b|A)P(A)}{P(b|A)P(A) + P(b|B)P(B) + P(b|C)P(C)}$$
$$P(A|b) = \frac{P(b|A)}{P(b|A) + P(b|B) + P(b|C)}$$
$$P(A|b) = \frac{\frac{1}{2}}{\frac{1}{2} + 0 + 1}$$
$$P(A|b) = \frac{1}{3}$$
So knowing $b$ doesn't change A's chances of being executed.
Best Answer
No, that's not what it says. It says that a 95% confidence interval for the actual chance of it occurring is approximately [0, 3/n]. That is not the same thing. The largest value for the 'chance of occurring' contained in the interval is indeed 3/n, though the question of which of the values within the interval is most likely is not answered.
The rule says: 'guess that the true chance of occurring is 3/n or less and you will be wrong about 5% of the time.
Exactly, so there is no need for a confidence interval because the 'chance of occurring' is known. You could, on the other hand, test the coverage of the approximate interval that the rule provides using such a wheel.
It is a misapplication of the idea of a confidence interval, which is applied to bound the range of plausible values of things that are unknown, and which in any particular application need not contain the true value if it becomes known.