[Math] Twenty questions against a liar

algorithmscomputer sciencediscrete mathematicssearching

Here's one that popped into my mind when I was thinking about binary search.

I'm thinking of an integer between 1 and n. You have to guess my number. You win as soon as you guess the correct number. If your guess is not correct, I'll give you a hint by saying "too high" or "too low". What's your best strategy?

This is an easy problem if I always tell the truth: by guessing the (rounded) mean of the lower bound and the upper bound, you can find my number in roughly log2 n guesses at most.

But what if I'm allowed to cheat once? What is your best strategy then? To clarify: if you guess my number, you always win instantly. But I'm allowed, at most once, to tell you "too high" when your guess is actually too low, or the opposite. I can also decide not to lie.

Here's a rough upper bound: you can ask each number twice to make sure I'm not cheating, and if I ever give two different answers, just ask a third time and from then on, play the regular game. In this way, you can win with about 2 log2 n guesses at most.

I'm pretty sure that bound can be improved. Any ideas?

Best Answer

There is a fairly simple strategy which requires (1 + ε)log(n) + 1/ε + O(1) queries, for any constant ε > 0. I illustrate this strategy below.

  • First, I ask you whether or not X, the secret number, is n/2. Without loss of generality, suppose you answer "less". I learn nothing at first, because you may be lying; but for the moment I give you the benefit of the doubt.

  • I next ask you whether X is n/4. If you say "less", I don't know whether or not 0 < X < n/4, but I do know that 0 < X < n/2, because you can't lie twice. Similarly, if I go about a normal binary search, so long as you continue to answer "less", I know that your answer to the preceding query was honest. So we may reduce to the case where you say "more" to my query of whether X is n/4.

  • If I continue to take you at your word, persue a binary search, and enquire about whether X is 3n/8, you may say either "more" or "less". If you say "more", then I don't know whether X > n/2, but I do know that X > n/4, again because you can't lie twice. So again so long as you continue to answer "more" in my normal binary search, I know that your answer to the preceding query was honest.

More generally: if I consistently guess under the hypothesis that you are being honest, in any "monotonic" sequence of responses, I know that all but (possibly) the last of them are honest. So it might seem as though the worst case scenario is where your responses alternate a lot, as would occur for instance if you had chosen something like X =  n/3 . But in the alternating case too, I can be confident of the honesty of some of your answers:

  • If you say (non-monotonically) that X is less than 3n/8, I don't know whether X is less than 3n/8 for sure; but I do know that X < n/2, because again you can't have lied about both.

More generally: not only do I know that monotonic subsequences of answers are mostly honest, I know that any time that you answer "more" or "less", your previous answer of the same kind was also honest. So in fact I should be most suspicious, when I encounter long monotonic subsequences in your answers, that the answer previous to that monotonic subsequence was a lie.

What I need is a strategy that will tell me when to revisit an old question, depending on how large n is. Ideally, revisiting old questions would require very little overhead. If I encounter a monotonic sequence of f(n) responses, I revisit the last question before that monotonic sequence started.

  • For instance, if your responses to queries about n/4, 3n/8, 7n/16, etc. are all monotonically "more", I eventually ask about the last time you said "less", which is for n/2, just in case you lied back then. This allows me to avoid the scenario where you lie about n/2, and keep me guessing at (2t−1 − 1)/2t until I eliminate all possibilities, catch you in your lie, and then redo all of my binary search for queries greater than n/2.

If I do a double-check like this every time I encounter a monotonic sequence of length r, I will in the worst case query you about (r+1)log(n)/r = (1 + 1/r)log(n) times. If I catch you in a lie, I will have only "wasted" r queries, and my strategy afterwards can be just a simple binary search without double-checks; so your optimal strategy for maximizing the number of queries I make is actually not to lie at all, or to save your lie for nearly the end of the game to cost me about r additional queries. Here r can be an arbitrarily large integer; thus for any ε, I can achieve a query rate of (1 + ε)log(n) + 1/ε + O(1) by setting r > 1/ε.

Bonus problem #1. Without too much extra work, I think you can improve this to a strategy requiring only log(n) + O(log log(n)) queries, but I'm too lazy to work out the details right now.

Bonus problem #2. Generalize this strategy to the regime where you are allowed to lie to me some fixed number of times L > 0.

Related Question