[Math] Optimal strategy for meta paper-scissor-rock

algorithmic-game-theorygame theoryoptimization

You'll be familiar with the childhood game of paper-scissor-rock (hereafter abbreviated to PSR).

Let an 'optimal strategy' for PSR be one which will not be expected to lose in the long term against any other strategy.

It seems intuitively obvious that the optimal strategy for PSR is to, on each turn, randomly go paper with 1/3 probability, scissors with 1/3 probability, or rock with 1/3 probability. The optimal strategy should be random (rather than, say, going paper then scissor then rock in turn), because otherwise an opponent can learn how to exploit it. It should have equal probabilities of paper, scissor and rock for the same reason – if it went paper with greater than 1/3 probability, the opponent would start going scissors more frequently.

Now consider another game, that of meta-PSR. Here, there are two new options – let's call them 'diamond' and 'charcoal'. Diamond beats any of paper, scissor & rock; but loses to charcoal. Charcoal loses to paper, scissor & rock, but beats diamond.

What is the optimal strategy for meta-PSR?

Again, it seems clear that a randomised strategy is necessary. What's not clear is with what probabilities they should be distributed. For instance, if they each had 1/5 probability, then an opposing strategy could beat it by going diamond with higher frequency and charcoal with lower. Any thoughts?

Best Answer

This is a zero-sum game with payoff matrix $$\pmatrix{0&1&1&1&-1\\ -1&0&-1&1&1\\ -1&1&0&-1&1\\ -1&-1&1&0&1\\ 1&-1&-1&-1&0}$$ where the rows are diamond, rock, paper, scissors, charcoal.

There are standard techniques to solve such games (easily Googleable!); a Nash equilibrium here is to go diamond or charcoal with $p=\frac{1}{3}$ each, and rock/paper/scissors with $p=\frac{1}{9}$ each.