[Math] Guess the number despite false answer

algorithmic-game-theoryalgorithmsgame theoryinformation theory

This is the Guess-The-Number game with a twist!

Variant 1

Take any positive integer $n$.

The game-master chooses an $n$-bit integer $x$.

The player makes queries one by one, each of the form "Is $x$ (strictly) less than $k$?".

The game-master answers each query immediately, always truthfully except at most once.

At the end the player must guess $x$ correctly to win.

What is the best strategy for the player to minimize the number of queries used in the worst-case? Note that the player can choose each query based on all the answers to the preceding queries. I have an algorithm that takes $n + O(\sqrt{n})$ queries, but I don't know whether this is optimal. Note that we can assume that the game-master does not have to choose $x$ at the start, but only has to keep his answers consistent with at least one possible $x$.

Variant 2

Also, what is the best strategy if the game-master can answer falsely to a fraction $r$ of the queries (meaning that after the first $k$ queries the game-master has answered at most $rk$ of them falsely). Clearly if $r \ge \frac12$ then the player has no definite winning strategy iff $n>2$ because the game-master can simply answer the first query truthfully by choosing $x$ to be in the larger half, and then any such strategy must work even if the game-master now tells the player two values that he guarantees $x$ is among, in which case there is only one useful question and the game-master simply answers such questions "yes" and "no" alternately (and answers all other questions truthfully), and the player cannot ever tell which of the two $x$ is. What if $r < \frac12$?

[Edit: I don't see why this should be downvoted, because I provided my own algorithm in an answer below, even though I don't know its optimality. If anyone can prove its optimality or can provide a better algorithm, please post an answer!]

Best Answer

What a lovely question! I'll concentrate on Variant 1.

You are in fact trying to guess two numbers. In the course of $Q$ questions (which I'll number from 1 to $Q$), you are trying to determine:

  1. The number, between $0$ and $2^n$. That is $n$ bits of information.
  2. Whichth answer was a lie. This is a number between $1$ and $Q$, plus $Q+1$ to denote "no lie", which makes $Q+1$ possible values, or $\log_2 (Q+1)$ bits of information.

Since the questions each yield 1 bit of information, the best possible number of questions would seem to be $Q=\lceil n+\log_2 (Q+1) \rceil$. Yes, this is a transcendental equation with $Q$ on both sides; but it does look a lot better than the bound you have proposed.

As for the algorithm itself, what I would do is a straightforward binary search, with the understanding that:

  • I am searching not only for the number $x$ but also for the number $f$ of the false answer, where (as before) $f=Q+1$ means that there is no false answer.
  • At each stage, my aim is to maximise the information extracted by having as many possibilities coming from a "Yes" answer as from a "No" one. But the possibilities to be counted are not possible values of $x$, as they would be in an ordinary binary search, but possible values of $\langle x,f\rangle$. This gives a different choice of question from the traditional "midpoint of the interval" choice.

In what follows, I'll take $n=5$, so that $0\le x\le 31$. Simple guesswork implies that $Q$ will be 9: that is, 5 to guess $x$, plus a bit over 3 to guess $1\le f\le 10$. ($Q=8$ is not enough, because we need to account for the possibility that there is no lie).

By symmetry, the first question is quite clearly the same as in the traditional binary search: "is $x\le 16$?". Without loss of generality, let's assume that the answer is Yes. In that case, the possibilities are:

  • $x\le 16$ and $1\lt f \le Q+1$: $16Q$ possibilities.
  • $x\gt 16$ and $f=1$: $16$ possibilities.

…so indeed we have chopped the $32(Q+1)$ possibilities in half.

For the second question, we ask "Is $x\le y$?" for some integer $y$. I'm going to neglect the possibility that $y>16$ because it seems too inefficient. There are two possible answers:

Yes

  • $x\le y$ and $2\lt f \le Q+1$: $(Q-1)y$ possibilities.
  • $y\lt x\le 16$ and $f=2$: $16-y$ possibilities.

No

  • $y\lt x \le 16$ and $2\lt f\le Q+1$: $(16-y)(Q-1)$ possibilities.
  • $16\lt x\le 32$ and $f=1$: $16$ possibilities.
  • $x\le y$ and $f=2$: $y$ possibilities.

…which adds up to $16(Q+1)$, as expected. But the possibilities are divided between $Qy+16$ for Yes and the rest for No, so the most even balance is achieved when $Qy+16$ is as near as possible to $8(Q+1)$: that is, when $y$ is as close as possible to $y=8\frac{Q-1}{Q}$.

With $Q=9$, the target value is $8\times\frac{8}{9}$, which is $7\frac{1}{9}$. So the second question needs to be not "Is $x\le 8$?", as a naive binary search would ask, but "Is $x\le 7$?".

The overall algorithm before asking each question is therefore:

  1. Note all the $\langle x,f\rangle$ pairs that are still possible given the questions and answers so far.
  2. Go through all possible values of $y$.
  3. For each value of $y$, consider how many $\langle x,f\rangle$ pairs match the answer "Yes" to "Is $x\le y$?" and how many match "No".
  4. Pick the value of $y$ for which the "Yes" and "No" counts are closest together, and ask the question using that value.

Although I assert that a binary search in $\langle x,f\rangle$ space is the best possible algorithm, and although I have given an approximate value for $Q$, I have not actually proved anything.

Related Question