[Math] best strategies for ‘Squid Game’ episode 6 game 4 “marble game” players

game theorylinear algebranash-equilibriumprobabilityprobability theory

Two players each get $n=10$ marbles. Every alternating turn, one player (say, the player whose turn it is) hides an amount of own marbles in fist, and, the other player must guess if hidden amount is odd or even, and, that other player (i.e., the player whose turn it is not) also must bet an amount of own marbles. If guess is right, the player whose turn it is gives (as much as possible) the amount of bet marbles to opponent. If guess is wrong, the player whose turn it is takes the amount of bet marbles from opponent. Next, the turn now alternates to the other player. The game stops when one player has all $2n=20$ marbles.

The losing player gets killed (in the series, that is).

Which strategies for both players (perhaps one strategy for the one who gets first turn, and one strategy for the other player) give maximal probabilities to win (and to not get killed).

We must assume both players know the starting conditions and are perfect mathematicians and logicians.

On a side note: is this an old or new game? If it is a known old game, can anyone tell where its 'official' rules (and, maybe, solution(s)) are documented?

remark

(series details coming so warning: spoiler ahead)

There is a YT video where rules are explained to be: if guesser guesses wrong, guesser must give amount 'h' that was hidden by hider, not amount 'b' bet by guesser.

https://www.youtube.com/watch?v=GX4AkD_vdhw

The video also gives the simple solution for that variant.

And presenter also mentions rules are not clear, but, most people, he says, believe 'h'. Some comments claim otherwise.

Fair enough.

From examples in series it is not entirely clear to me either what the rule is.

There is only one example where guesser guesses wrong.

In that example, guesser bets b=2, and hider hides h=3, and guesser guesses odd. One can see the guesser give 2 (amount b bet), and, next it is shown guesser has only one more left. The guesser stops playing. If the rule were to give 3 (amount h hidden), guesser would immediately loose, and if the guards were paying attention, guesser would have been killed. If the rule were to give 2 (amount b bet) then guesser, upon becoming hider, would also loose. But, current guesser is a cheater so, in both cases (having to give b=2 or h=3), it fits him to hold that last marble.

Note that the cheater is portrayed to be clever and his opponent to be dumb. And the opponent did not know the game.

However, if rule were to have to give h(=3), then guesser would cheat hard in not holding on to the rules, and, be lucky to still be alive. After all, the outcome of each games in the series is said to be fatal, but fair. But, if rule were to have to give b(=2), then guesser would still cheat hard by stopping the game, and, even manipulating opponents marbles. A moderately clever guard paying attention would notice game was not played fair, and, a somewhat more clever guard paying attention would even know guesser lost in any case.

Best Answer

Essentially, this is like one player choosing "odd" or "even" freely, and the other player trying to guess. That game is straightforward: both players should choose "odd" and "even" with equal probability, and then it becomes a martingale so the amount you choose to bet is irrelevant: you win with probability $1/2$.

Your game has an added complication: if it is your turn to hide marbles, and you only have one left, you can only hide an odd number and you lose.

However, all this means is that no-one will ever bet $k-1$ marbles if they have $k$ left: it would be a better strategy to bet all $k$, since this is better if you guess right, and no worse if you guess wrong (you certainly lose either way). Note that it is not possible for your opponent to have exactly $k-1$ marbles if you have $k$, on parity grounds. However, since we've established that no-one would ever bet $k-1$ marbles when they have $k$, it follows that no-one will ever have exactly one marble left at their turn to hide, so the game becomes a martingale again.

tl;dr the best strategy is "always pick even/odd with equal probability, and if you have $k$ bet any number of marbles you like as long as 1) you don't bet more marbles than your opponent has left, and 2) you don't bet exactly $k-1$". Then each player has equal probability of winning.

Related Solutions

Game Theory – Best Strategies for High-Low Game

A high-school student at IMSA and I analyzed the adversarial High-Low finding game of numbers $1\ldots N$ for $N$ up to $100$ about a decade ago.

Before presenting the answers, let me distinguish these games, in which the adversary has to choose a number in advance, and a game in which the adversary is free to alter the number chosen as long as all the answers given to date remain true. Allowing the adversary to change the answer makes the game much easier to analyze (it is no longer an imperfect information game!) and this is particularly true if we consider a variant of the game in which a single lie is permitted.

In general, there is a unique optimal hiding strategy (that is, a unique optimal distribution of probabilities of choosing the number $k$ with $1 \leq k \leq N$). But for many values of $N$, there are a range of optimal finding strategy mixes (that is, there are various decision trees that all lead to the same optimal expected finding time). In this answer I will only describe one finding strategy for a given $N$.

$N=3$: Hiding strategy is $2 \,|\, 1\,|\, 2$, meaning that the hider chooses $$P(k=1) = \frac{2}{5} \\P(k=2) = \frac{2}{1} \\P(k=3) = \frac{2}{5} $$ Henceforth, I will describe hiding strategies as just that sequence of relative values. The finding strategy is $$ P( (2)) = 3/5 \\ P(1;(3)) = 2/5 $$ This notation means the finder chooses an initial guess of $2$ 60% of the times, and the combination of [an initial guess of $1$ intending to guess $3$ next] and [the mirror reflection of that strategy, guessing 3 first] 40% of the time.

For $N=3, V(3) = \frac95$ which is 0.13333 worse than the optimum obtained by a binary search if the hider must choose $k$ as a uniform random on $[1,3]$.

For $N=4$ the optimum hiding strategy is $1 \,|\, 1\,| \, 1\,| 1$ (how boring!) and the optimum finding strategy is $P(2;(4)) = 1$, again with the understanding that that $1$ is split (half-half) between the strategy shown of choosing $2$ and then, if the answer is "higher", choosing $4$, and its mirror image of choosing $3$ and then $1$. This mirror property will hold for all the solutions I will present (although there are other finding optimal strategies for some of the games that do not have mirror symmetry). For $N=4$ the adversarial hiding game has the same value as the uniform-distribution binary search.

For $n=5$ the optimum hiding strategy is $5 \,|\, 3\,| \, 2\,| 3\,| \,5$ and the finding strategy is $$ P( (3;(1)(5) ) = 4/9 \\ P(2;(4) )= 4/9 \\ P(2;(5;(4))) = 1/9 $$ A final notation tidbit here -- the first of these strategies exhibits, for the first time so far, the information as to what you will do for each of two possible answers, saying that if you guess $3$ and the number low, you use the strategy $(1)$ but that if it is high you use the strategy $(5)$. This is why some single numbers are written in parentheses; items in parentheses represent complete finding strategies for some sub-interval.

Note that even though the numbers $1$ and $5$ are significantly more common than the other numbers, you can never (in an optimum strategy) start by guessing one of those endpoints. That sacrifices too much information in the case that the first guess does not hit $k$ immediately. $V(5) = 20/9$, which is $0.2222$ worse than the non-adversarial (uniform distribution) finding problem.

For $N=6$ The optimum hiding strategy is $5 \,|\, 3\,| \, 2\,| \, 2\,| 3\,| \,5$ and the finding strategy is $$ P( (3;(1)(5) ) = 3/5 \\ P(3;(1)(6;(5)) )= 1/5 \\ P(2;(5;(4))) = 1/5 $$ (Don't forget that since you must split these between the stated strategies and their mirrors, this is really a six-component mixed strategy.) $V(6) = 12/5$ which is 0.066667 worse than the value of the non-adversarial game (which I will $B(6)$.

For $N=7$ thru $N=10$ the optimum hiding strategy is $2 \,|\, 1\,| \, 1\,| \cdots \,|\, 1\,| 1\,| \,2$ and $$V(N) = \frac{4N-5}{N+2} = B(N) + \frac{22-2N}{N(N+2)}$$ so the adversarial game is worse than the uniform game by an amount tan falls from $0.127$ down to $0.0167$.

It would be tempting to suppose that that pretty property of hiding by doubling the chances of each end number, and otherwise choosing the number as a uniform random, holds from then on. Not that simple!

For $N=11$ the optimum hiding strategy is $58 \,|\, 29\,| \, 29\,| 34 \,| 25 \,| 22 \,| 25 \,| 34 \,| 29 \,| 29\,| 58$ and an optimum finding strategy is $$\begin{array}{lll} P(6;(3;(1)(4))(9;(8)(11))) &=& 117/744 \\ P(6;(2;(4))(10;(8))) &=& 123/744\\ P((5;(2;(4))(9;(7)(11))) &=& 348/744\\ P((4;(2)(8;(6)(10))) &=& 54/744\\ P(4;(2)(9;(7;(5))(11))) &=& 72/744 \\ P(4;(1;(3))(8;(6)(11;(9)))) &=& 30/744 \end{array} $$ I should mention the technique for solving these games: It is easy enough to write a simplex solver and tie it to something that fills in the expected values (which may have halves because a strategy includes its mirror image) for each combination of number chosen and strategy employed. The tough part is to feed in a suitable set of candidate strategies. You can't afford (for large $N$) to simply generate all possible strategies, as the number grows exponentially. For example, with $N=11$ we initially fed in $17$ plausible finding strategies. The key next step is to assume the hiding strategy found by the simplex solver is fixed, and solve for the best possible search. If the expected value is the same as the game values you just solved, then you are through; otherwise, you need to add in at least the superior finding strategy you just determined, possibly chop out some of the discarded strategies, and try again.

For $N=12$ the optimum hiding strategy looks qualitatively like that for $N=11$, it is $68 \,|\, 34\,| \, 34\,| 43 \,| 31 \,| 24 \,| 24 \,| 31 \,| 43 \,| 34\,| 34\,| 68$. Note that both of these look a lot like the basic $2,1,1,1,\ldots,2$ pattern, but with some waviness.

$N=13$ and $N=14$ fall into a third common pattern: $$129 \,|\, 77\,| \, 52\,| 62 \,| 64 \,| 48 \,| 48 \,| \cdots \,| 48 \,| 48\,| 64\,| 62\,| 52\,| 77\,| 129$$.

$N=15$ through $N=19$ obey the same pattern found for $N=5$ and $N=6$, namely $5 \,|\, 3\,| \, 2 \,| 2 \,| \cdots \,| 2 \,| 2\,| 3\,| 5$. $N=20$ foall back into the simpler $2\,| \, 1 \,| 1 \,| \cdots \,| 1 \,| 1\,| 2$ pattern. That pattern persists until $N=28$ which has a fairly ghastly hiding pattern where the least common multiple of the denominators is a 25 digit number; it still looks like a slightly wavy shallow bathtub with the end elements precisely double their neighbors.

By the time we get to $N=30$ we are back to the simple $2$ pattern, but $N=31$ has the pattern that starts with $129$. The highest one I remember is for $N=63$, for which the optimal hiding strategy is the "Fibonacci" pattern $5 \,|\, 3\,| \, 2 \,| 2 \,| \cdots \,| 2 \,| 2\,| 3\,| 5$. It may be a deep problem to determine what drives the changes in these patterns.

We also solved the one-lie fully adversarial (but no "cheating" by changing the selected number) game, for up to $N=9$. That problem is vastly more difficult, since both the finder and the hider have large sets of plausible strategies. One thing that emerged is that the naive single-error-correcting finding strategies (based on assuming that the number originally chosen was uniformly random and that lies are equally probable at each guess) are markedly inferior; typically more than $0.25$ of a guess worse than the optimal finding strategies.

Predicting results of a game (TicTacToe), with players using fixed strategies

You could definitely use a Markov chain for this, but given that there is no possibility of repeating a position in Tic Tac Toe, this is a bit overkill - elementary techniques work just fine for this sort of thing, since the answer is just successively taking weighted averages of a bunch of probabilities. For instance, if you had this position (with X to move): $$\begin{array}{ccc}X & O & X\\ O & * & *\\ *&* & *\\ \end{array}$$ where the $*$'s are empty spaces, you would find that the probability of winning/losing/drawing are just the average of the respective probabilities of each of the $5$ positions $X$ might randomly choose - which, could each be computed the same way. There are not so many positions, at least in computational terms, so just programming a computer to run this calculation is not so intensive, as long as you save the results of each computation (i.e. use memoization).

In concrete terms, the algorithm to compute this quantity is simply: let $L$ represent the current board state. First, check if someone has won in $L$ or if a draw has been reached - the probability will be either $0$ or $1$ in these cases. If not, compute every move that the player whose turn it is might reasonably make (i.e. if they can win, a winning move. If not, but their opponent could win, a move to block that. If not, any legal move). Compute the probabilities of winning from those states and take their average. Save the result. Note that this method will never touch unreachable states.

In Mathematica, this is implemented as follows - one can modify the ReasonableMoves function for other strategies - or write this in other languages. Since it seems like you have a working simulation already (unless you did 100,000 trials by hand), you could probably easily modify it to give an exact answer instead of an approximation, as long as your language has some easy way to support exact rational arithmetic and an associative container for memoizing positions.

IsWinForPlayer[p_, l_] := With[{occupied = Map[# == p &, l, {2}]},
   Or[Or @@ (And @@ # & /@ occupied), 
    Or @@ (And @@ # & /@ Transpose[occupied]), 
    occupied[[1, 1]] && occupied[[2, 2]] && occupied[[3, 3]], 
    occupied[[1, 3]] && occupied[[2, 2]] && occupied[[3, 1]]]];
IsDraw[l_] := Plus @@ (Plus @@ Map[Abs, l, {2}]) == 9;
WhoseTurn[l_] := If[Plus @@ (Plus @@ l) == 0, 1, -1];
EmptyPositions[l_] := Position[l, 0, {2}];
ReasonableMoves[l_] := 
  Module[{empty, player, possible, winning, opponentWin},
   empty = EmptyPositions[l];
   player = WhoseTurn[l];
   possible = ReplacePart[l, # -> player] & /@ empty;
   winning = Select[possible, IsWinForPlayer[player, #] &];
   If[Length[winning] > 0, winning];
   opponentWin = 
    Select[empty, 
     IsWinForPlayer[-player, ReplacePart[l, # -> -player]] &];
   If[Length[opponentWin] > 0, 
    Return[ReplacePart[l, # -> player] & /@ opponentWin]];
   possible
   ];
StartingPosition = {{0, 0, 0}, {0, 0, 0}, {0, 0, 0}};
ProbabilityOfWin[p_, l_] := 
  ProbabilityOfWin[p, l] = 
   Which[IsWinForPlayer[p, l], 1, IsWinForPlayer[-p, l] || IsDraw[l], 
    0, True, Mean[ProbabilityOfWin[p, #] & /@ ReasonableMoves[l]]];

It gives a probability of $347/1680$ for the first player winning and $169/1680$ for the second player and only takes about 1 second of computation for each on my laptop (in Mathematica - a language not known for speed). These numbers seem a lot lower than your simulation (which should have been very accurate for the number of trials) - so there might be some discrepancy in what the actual strategy used was - but the method generalizes to any strategy. This method can also be modified to find the optimal strategy by instead computing, for each position, whether it is a win, draw, or loss under optimal play by looking at whether each legal move from that position is a win, draw, or loss.

Best Answer

Related Solutions

Game Theory – Best Strategies for High-Low Game

Predicting results of a game (TicTacToe), with players using fixed strategies

Related Question