[Math] Game Theory/Bayesian approach to a bluffing game

gamblinggame theoryprobability

Two players play the following card game with a deck consisting of (A,2,3,4,5).

  • A dollar is placed in the pot by some third party, and player 1 is dealt a card. If it is an A, he has a winning card, otherwise he has a losing card.
  • Player 1 can decide whether to fold (conceding the dollar to player 2) or bet, by placing an additional dollar in the pot.
  • If Player 1 bets, player 2 can decide to fold (conceding the pot to player 1) or call, by placing an additional dollar in the pot.
  • If player 2 calls, player 1 reveals his card – if it is a A, he takes the pot, otherwise player 2 takes the pot.

It's not hard to see that there's a Nash equilibrium where player 1 always bets if he has an ace, and bluffs 12.5% of the time if he doesn't. Player 2 calls 50% of player 1's bets. The long-run expectation is \$0.30/round for player 1, and \$0.70/round for player 2.

However, in reality it is unlikely that both players will play the Nash equilibrium strategy (they may not be rational, they may not believe that the other player is rational, they may just be playing for fun, etc etc).

From player 1's perspective, if player 2 calls a fraction $p$ of the time, then player 1 should always bluff if $p<0.5$ and never bluff if $p>0.5$, in order to maximize his own expectation. That is, the optimal bluffing frequency is

$$B(p) = \begin{cases} 1 && p < 0.5 \\ 0 && p > 0.5\end{cases}$$

Now, player 1 might have a probability distribution $f(p)$ on player 2's calling frequency. In that case, it seems to make sense that the bluffing frequency for player 1 would be the probability-weighted average of the optimal action for each specific $p$ –

$$p_{\rm Bluff} = \int_0^1 {B}(p) f(p) dp = \int_0^{0.5} f(p) dp \tag{1}$$

But that leads to the conclusion that if player 1 has an uninformative prior $f(p)=1$, he should bluff with frequency $p_{\rm Bluff} = 0.5$. That contrasts with my intuition that if you have no information about player 2's strategy, you should play the Nash equilibrium strategy.

I'm concerned that equation (1) doesn't make any mention of the Nash equilibrium strategy at all – it seems that it should play a role, which makes me think that (1) cannot be correct.

Which is it? Does game theory have anything to say about the situation where one player might not play optimally, but you don't know exactly how he plays? Is there a way of deriving the "correct" play if you have a probability distribution for your opponent's strategy?

Best Answer

Although I post it as answer, it is more an opinion (too big to fit in the comments).

Your question concerns a field called "Epistemic Game Theory" which is primarily concentrated in the modelling of player's beliefs. Epistemic means whatever has to do with players beliefs about other player's strategies, their knowledge, their beliefs about the beliefs of others etc. Among many others, there are results that prove under which conditions the beliefs of the players (viewed as mixed strategies of the other players) form a Nash Equilibrium.

In your example the uniformative prior you employ, can be interpreted as a strategy of Player 2 (in the eyes of Player 1). Player 1, assumes that he is playing against the uniform distribution. That is, he believes that Player 2 is choosing completely in random his calling frequency $p$. Believing that, his optimal strategy (against the purported strategy of Player 2) is indeed what you found, i.e. to bluff $50%$ of the time.

I do not agree with the intuition that when you do not know what the other will do (i.e. uniform prior) that you should play the Nash Equilibrium. In contrast, the Nash Equilibrium assumes only rational behaviour and one should play it, if he believes that the other players are rational, if he believes that the other players believe that he is rational, if he... i.e. if there is common knowledge of rationality. In fact, there are certain works that stipulate the exact epistemic conditions under which the Nash Equilibrium will be played (which are in several cases weaker than the aforementioned common knowledge of rationality).

Finally, there is a discussion concerning the selection of the prior distribution. The consensus is that a common prior should be used (i.e. a prior upon which every player agrees), since although at first counterintuitive, the common prior leads to the model that yields the most applicable and useful results. That is called the common prior assumption. It practically excludes cases where a player has arbitrary beliefs (not supported by anything) and plays against them (viewing them as strategies of the others). The common prior assumption states briefly that somewhen in the past we all agreed about what would be the possible outcomes (and their probability) and after that any changes in our beliefs are due to private information, accumulation of knowledge etc. (That of course is not against the uniform prior you used.)

So, in sum, although the problem that you adress is very interesting, I do not know if it has a straightforward answer. There is no mistake (from a cursory look) in your calculations, but still suboptimal behaviour and beliefs are so arbitrarily defined that allow many different approaches. (Of course, the above express an opinion and there can be for sure more clever and correct answers to your question.)

Related Question