I think that one problem in creating an AI solution to what should be bid in pinochle, it that humans have a hard time answering the same question.
Two thoughts:
- I like the way you isolated it to 3 player single deck. This avoids the more complex calculation of "what does my partner have?" or "should I let my partner have the bid? (does she have a better hand than mine). it also eliminates various strategies of partner bidding that are used to signal. (For example, some players use 51 as an opening bid to let the partner know that they have aces)
- The basic question you are trying to answer is: what I have in my hand + what can I expect from the 3 random cards? + what I can expect to win during the trick phase.
What I have in my hand is easy (you have already given the list)
What can I expect from the 3 random cards? the low end of the range is zero and the top end of the range depends on what is in your hand. I think this would be easier to model than the trick phase. In a single deck, their are no doubles, so the max would be 43, in the case where you had an ace, king and queen in a single suit, named that suit trump and the ace, king and queen also completed sets (one of each suit).
For the trick phase, my mom taught me that safe was 1/2 the points available (since you named trump) with upward adjustments if you have aces, a majority of trump or a "short suit" with a strong backup suit. (one suit with zero or very few cards with a non trump suit where you have ace, ten, king for example). There are 24 points in a singe deck, so 12 is a good starting point.
For humans, bidding is a combination of addition and instinct. I imagine that if I were programming a machine that I would put ranges in. Probabilities are no match for reality - what actually happens.
As @glen_b pointed out, likelihood is not an inverse probability, as $\theta$ is not a random variable. However, you are correct in that it is a measure of evidential support. One caveat is that, unlike probability, it is not an absolute measure of support (a likelihood of 1, 10, or 1000 has no intrinsic meaning), but a relative measure of support. Generally, this is encoded by forming the likelihood ratio (LR):
$$ LR(\theta):= \frac{L(\theta;x)}{L(\theta_{MLE};x)}$$
Which will always be between 0 and 1. This is an improvement over the unnormalized likelihood, but we still aren't quite there. It turns out that, for example, a $LR=0.15$ is not by itself a useful measure either, since its interpretation depends on the dimension of $\theta$. If $\theta$ is a scalar, then $P(LR<0.15) \xrightarrow{n} 0.05$, so it can be used in a probabilistic framework in much the same way as any other test statistic.
However, it can also be used as a purely subjective measure of what we consider "plausible" parameter given the data (read:evidence). Under this non-probabilistic interpretation, we would say that any scalar $\theta$ that resulted in $LR<0.15$ would be "implausible" or "unlikely". Now, what if we wanted to port this same subjective assessment to a vector parameter, say $(\theta_1,\theta_2)$? Unfortunately, we cannot continue to use $0.15$ as our cutoff for "unlikely" (well, of course you can, but then your inferences at a higher dimension will not be compatible with inferences at a lower dimension. This is a subtle point. A good article on this was written by one of the strongest proponents of likelihood inference (JK Lindsey). See here.). Essentially, compatible inference can be implemented by raising the scalar likelihood cutoff to the number of dimensions of the vector parameter. For example, if our parameter dimension is 2, then a cutoff that would be compatible with $0.15$ would be $0.15^2$.
The above is a very abridged description of modern likelihood. I think your confusion is shown by the following statement you made:
Given that Graham is using an umbrella, there is a 20% chance that it is raining.
This is actually not what a 20% likelihood would tell you. What you stated above is a Bayesian posterior probability: $P(\textrm{Raining}|\textrm{Umbrella})$, what the likelihood it saying is quite the opposite:
$$L(\textrm{Raining}|\textrm{Umbrella}) = P(\textrm{Umbrella}|\textrm{Raining})$$
As you correctly pointed out, a prior probability (and a normalizing constant) is required to turn a likelihood into a probability.
Best Answer
The joint probability of event $A$ and $B$ is defined as $$ P(A,B) = P(A|B)P(B) = P(B|A)P(A). $$ This could be understood intuitively as the occurrence of either $A$ or $B$ might influence the probability of the other happening (called dependent events). Scenario 1 is a special case, in which the occurrence of either event does not affect the probability of the other (A and B are independent events), i.e. there are the same number of 6s that are red compared to those that are in other individual colors, so $$ P(A|B) = P(A) \Rightarrow P(A,B) = P(A)P(B). $$ But the same is not true for the second scenario. There is dependency in the data such as there are more people in the “planned to travel” category that travelled. So to calculate the joint probability mathematically, we have to multiply P(Planned to travel given travelled) by P(travelled). The formula for the conditional probability is given by $$ P(A|B) = \frac{P(A,B)}{P(B)} $$ which already has $P(A,B)$ in the numerator, so in this type of scenario, the joint probability is rarely calculated multiplicatively.