[Math] The Chow & Robbins game ≈ 0.79295350640: improvements could come from simple statistics, or from a continuous version of the game

differential equationsgame theoryna.numerical-analysisst.statistics

This question seeks help with improving a numerical estimate of the value of the Chow and Robbins game. Much about this game is unknown, such as whether its value is rational, but there are two routes to improve my numerical estimate. One would be easy for a statistician; one requires something more subtle.

Toss a coin repeatedly until you say “stop”, when you score the proportion of tosses that are heads = $h / (h+t)$. So if the first toss is a head, stop and score 1, the maximum. If tosses are tail-head-head-head, then stop and score ¾ (the optimal strategy after tail-head-head is to toss again, with expected value ≈ 0.6694 > ⅔).

Obviously one could, at any time, undertake to keep tossing until $h=t$, which happens eventually with probability one.

So a simple strategy would be to stop if the first toss is a head, otherwise to toss repeatedly until $h=t$. So the value of the game must be ≥ 0.75.

Numerically one can get a lower bound for the value of the game by building an $n$×$n$ spreadsheet, with $h=t=0$ in the top left, tails increasing to the right and heads increasing down. Cells are then filled with $=Max( h/(h+t), Avg(↓,→), ½ )$. This is choosing the best of stopping now and scoring $h/(h+t)$, tossing once more with half chance of a head and half chance of a tail, and tossing forever (relevant only on the boundary). Obviously stopping is disallowed with $h+t=0$.

And if the spreadsheet size is 1024×1024 the value of this constrained game is an edge more than 0.79272249. Obviously doubling the size of the spreadsheet would increase the value of the game, and doubling again increases it again, though by only 0.602× as much. This ratio varies little over large increases in $n$. So one can estimate the value at the limit as the current estimate, plus the most recent difference divided by one less than the reciprocal of the ratio.

(Diagrams and data at jdawiseman.com.)

This leads to the first question, the easy statistical one.

Easy statistical question

Currently I’m estimating the value at the limit using only three spreadsheet sizes, and these in geometric progression. Surely it is possible to get a better estimate, and some sense of ±error around that, using all the data at once. Presumably one should model the value of the restricted game as being something like 0.79295350640 – 0.036659 × ($n$ ^ –0.7311766), with constants to be better estimated. My guess is that this would be done with a fitting, giving greater weight to larger $n$. So, using all the data at once, what is the best estimate of the constant term and its error bounds?

A continuous game that approximates the estimate at the discrete game’s boundary

My approach uses backward induction from a boundary, the value at the boundary being deemed to be the better of stop = $h/(h+t)$ and toss forever = $½$. A better estimate at the boundary is likely to improve the speed of convergence quite dramatically.

Return to the unconstrained game, so on the spreadsheet of infinite size. For each $t$ there is some $h[t]$, such that one would stop with $h ≥ h[t]$, and continue otherwise. Let’s have a value of the not-yet-defined continuous game, $EV[h,t]$, of which I know some of the properties.

• ∃ EV[h,t] continuous $∀ h≥0, t≥0$.

• ∃ $h[t] ≥ t$ such that $h≥h[t] ⇒ EV[h,t]=h/(h+t)$.

• $h < h[t] ⇒ ∂EV[h,t]/∂h = –∂EV[h,t]/∂t$. This is the analogue of the “Avg(↓,→)” from the discrete game.

• For fixed $h$, $t→∞ ⇒ EV[h,t]→½$.

• Then $h[t]$ is chosen to maximise $EV[0,t']$ for some $t'≥0$.

• Any other sensible properties.

Hard ill-defined continuous game-theoretic question

Define a continuous version of the Chow & Robbins game, with known analytic solution, that can be used as the boundary approximation in the computation of the value of the discrete game.

Best Answer

This is a comment, but a little long to fit in the comment section.

Here is one attempt at a better termination strategy to execute after reaching $n$ tosses. This might improve the speed of convergence.

If $h_n\lt t_n$ at the end of the time period, you continue play until $h=t$ and then stop. Instead, for any $a$, you could continue play until $h = t+a$. This is a reasonable strategy even if $h_n$ is slightly greater than $t_n$. We'll compute the value of this strategy conditioned on $(h_n,t_n)$. Choose $a$ depending on $h_n$ to optimize the expected value. This is still not optimal, but it is a better strategy so the error when truncating at depth $n$ is smaller.

The value of continuing from $(h_n, t_n =n-h_n)$ until $h = t+a$ can be computed from the probability distribution for the first hitting time of $b = a-h_n+t_n$ of a random walk starting at the origin. The number of paths from $(h_n,t_n)$ which stop at $((x+n+a)/2,(x+n-a)/2)$ at time $x+n$ is ${x-1 \choose (x-b)/2}-{x-1 \choose (x-b-2)/2}$ by reflection. The expected value of continuing until $h = t+a$ from $(h_n,t_n)$ with $h_n \lt t_n+a$ is

$$\sum_{x | x-a ~\text{even}} \frac{(x+n+a)/2}{x+n} 2^{-x} ({x-1 \choose (x-a + h_n-t_n)/2} - {x-1 \choose (x-a +h_n-t_n-2)/2}) $$

This type of sum has a closed form expression, which Mathematica finds to be

(1/(a - 2 h +  2 n))2^(-a + 2 h - n) (a - h + n) HypergeometricPFQ[{a/2 - h + n/2, 1/2 + a/2 - h + n/2, a/2 - h + n, 1 + a - h + n}, {1 + a - 2 h + n, 1 + a/2 - h + n, a - h + n}, 1]

That looks ugly, but particular values seem to be in $\mathbb Q + \pi ~\mathbb Q$ or $\mathbb Q + (\log 2)~ \mathbb Q$. For example, if $n=100, h=40$, it appears that the optimal fixed strategy is to continue until $h = t+26$, which has an average value of $19100588932317484511620187/37791031587382532530550556 \approx 0.505427$.

There are better non-constant termination strategies, but it's not clear to me which of these can be evaluated.

Related Question