[Math] Lower bound in algorithmic puzzle


Puzzle: there are $n$ computers most of which are good; the others may be bad ("most" in the strict sense: there are strictly more good computers than bad ones). You may ask any computer $A$ about the good/bad status of another computer $B$. if $A$ is good it will correctly indicate $B$'s status, but otherwise it may answer whatever.

Your goal is to locate a good computer using the minimum number of questions in the worst case. In other words, devise an algorithm that requires no more than $N$ questions regardless of the outcome and is guaranteed to pinpoint a good computer, and make $N$ as small as possible.

The original puzzle asks for the optimal $N$ when $n=100$.

Warning: spoilers below. Stop here if you wish to think about this fun puzzle.


I, and everyone else I know who solved this, can do the $n=100$ case with $97$ questions in the worst case. I'm pretty sure this is optimal but I do really miserably on lower bounds. The simplest case where I can't match the bounds is $n=7$ (at most $3$ bad computers): this is doable with $5$ questions and I can rule out $3$ but I can't rule out $4$.

More generally, if the number of bad computers is at most $k$ (so $n=2k+1$ or $n=2k+2$), I can show that at least $k+1$ questions are needed while $2k-1$ questions suffice. Can anyone narrow that gap?

EDIT: starting a bounty, looking for improvements to the lower bound (or the upper bound, though I'd be surprised if the latter is possible) for general $n$. A transparent argument for why 7 computers require 5 questions is also good, but a computer-assisted case-by-case enumeration is not.

Best Answer

This is a very fun problem, which I have encountered in the literature as the knights and spies problem. Good computers are called knights, because they always tell the truth, while bad computers are called spies, because they say whatever they want. (Traditionally, a liar would be called a knave.) I will use this terminology because I feel it is more consistent with other liar/truth-teller puzzles.

Let me briefly discuss an alternative objective: figure out everyone's identity, not just a single knight's. I encourage you to think about this problem before reading further. This is an interesting problem, in part because it's exciting how many different bounds I have seen people come up with:

  • $n^2$ or $n(n-1)$ by asking everyone about everyone
  • $\Theta(n\sqrt{n})$ using a standard "square-root" trick or $\Theta(n\log{n})$ by being smarter about the recursion
  • $5n + O(1)$, $3n + O(1)$, and $2n + O(1)$ by increasingly efficient methods related to pairing up people
  • $3n/2 + O(1)$ optimally!

The 2009 paper "Knights, spies, games and ballot sequences" by Mark Wildon proves that the final bound is optimal, computing the exact value of the $O(1)$ for every $n$. Mark Wildon has a webpage with additional information about the problem which notes that this solution was previously published by Pavel M. Blecher in the 1983 paper "On a logical problem". The strategy used in these papers, which Wildon calls the "Spider Interrogation Strategy" is fantastic.

Your problem of identifying a single knight is more challenging! I'll go straight to the punch by saying that the optimal bound is $n-1 - H(n-1)$ where $H(k)$ is the number of 1 bits in the binary representation of $k$. I haven't carefully read the other answer to this question, but as far as I can tell, this bound can easily be achieved by a trivial modification of that strategy.

The upper bound isn't too bad, but the lower bound is much trickier. When my friends and I thought about (and solved!) this problem, we used a reduction to the following problem:

There are two players, the Questioner and God, and a collection of (nonnegative) numbers. The Questioner's aim is to achieve a violation of the (strict) triangle inequality, that is, have a single number be greater than or equal to the sum of the rest. God's aim is to not have this happen for as long as possible. Each move consists of the Questioner asking about two numbers, to which God replies "sum" or "difference"; the two numbers $a$ and $b$ are then replaced by either $a+b$ or $\lvert a-b\rvert$ accordingly. Since the number of numbers decreases by one each turn, eventually there will be two numbers and the (strict) triangle inequality is necessarily violated. If God manages to not lose before then, we say that "God wins".

Here are some fun facts:

  • Consider a starting position of $n$ ones. Then God wins if and only if $n$ is 1 more than a power of 2.

  • Suppose God has a winning position. Then his answer to any question the Questioner can ask is forced; that is, it is always the case that one of the two answers is losing. We call this "Uniqueness". The way we think about it is that God somehow has to be very careful in answering every single question, and indeed, if you look at optimal play in "endgame" positions, it's hard to predict what the correct answer for God is.

  • Suppose the starting position is $n=2^k+1$ ones. By the above two points, this is a winning position and God must carefully answer every question so as not to lose. Nonetheless, the answer to the first $(n-3)/2$ questions asked will necessarily be "sum".

So there's this phenomenon that in a certain class of positions, God answers blindly for the first half of the time, and then has to be very careful afterwards. It's very weird.

We finally classified winning positions. It turns out that the first bullet point above is true because $\binom{2n}{n}/2$ is odd only when $n$ is a power of 2.

Some more quick observations:

  • With regards to the reduction: this sum/difference game is just what you get if the spies are knaves. In that case, the knights and spies are just two groups of people which support themselves but accuse the others.

  • The lower bound $n-1-H(n-1)$ holds even if you're trying to find a knight or a spy.

Let me finish with some references to the literature. This result has been published many times! The term used in the literature for this game is "the majority game".

You have a group of $n$ items, each with one of $k$ labels. One of the labels is shared by a majority of the items. You are allowed to ask if two items have the same label, and the goal is to minimize the number of questions necessary to identify a majority element.

The best studied case is when $k=2$. The problem is often presented as played by a questioner and an answerer. This is not exactly the same game, because in principle spies could tell the truth. However, the lower bound reduction above is to the case when the spies are knaves, and then this is the same game.

Here's a summary of the relevant literature:

  • Michael J. Fischer and Steven L. Salzberg.
    Finding a majority among $n$ votes.
    J. Algorithms 3 (1982) 375--379.

    They consider the problem when $k$ is unknown, and a majority may or may not exist. They prove that the optimal bound is then $\lceil 3n/2 - 2\rceil$ questions. You may recognize this as the bound from Mark Wildon's paper. Now, I am a little confused because it doesn't seem to me that the problems map onto each other exactly, but I find it hard to imagine that it's an accident.

    Actually, the algorithm they present seems strikingly similar to Wildon's "spider interrogation strategy". Because Fischer and Salzberg are in a slightly different model (in particular one that is symmetric w.r.t. asking $x$ about $y$ and $y$ about $x$), you have to change some of the details. I don't think they don't exactly map onto each other (in terms of the questions asked), but they're similar.

  • Saks, Michael E. and Werman, Michael.
    On computing majority by comparisons.
    Combinatorica 11 (1991), no. 4, 383--387.

    They show that if $n$ is odd, the optimal bound is $n-H(n)$, where $H(n)$ is the number of 1-bits in $n$.

    Their analysis proceeds by first reformulating the game as between a selector and an assigner. A position in the game is a multiset of integers, and the rules are as in the sum-difference game, e.g. "the game ends when the largest number in the multiset is at least half the total." They then have a slick proof of the upper bound! (In their notation it's a lower bound.) I mention this because even though the upper bound isn't hard, some analyses gets mired in minor details.

    They then define a 2-adic valuation invariant which my friends and I also discovered. Their proof uses generating functions which ours did not, but I believe the actual invariant is the same as ours. Finally, they apply the invariant result to the position consisting of $2h+1$ ones by computing the valuation of $\binom{2h}{h}$.

  • Alonso, Laurent; Reingold, Edward M.; and Schott, René
    Determining the majority.
    Inform. Process. Lett. 47 (1993), no. 5, 253--255.

    They present a short proof of the previous paper's results. The upper bound uses the same neat trick. They then prove the lower bound with essentially the same 2-adic invariant technique, except with a different exposition. Admittedly, it's cleaner and doesn't use generating functions.

  • Wiener, Gábor
    Search for a majority element.
    International Conference (of the Forum for Interdisciplinary Mathematics) on Combinatorics, Information Theory and Statistics (Portland, ME, 1997). J. Statist. Plann. Inference 100 (2002), no. 2, 313--318.

    He proves some related results, including the following:

    If asking about $x$ and $y$ is an optimal question, then there exists an optimal question that doesn't ask about $x$. It follows that the optimal number of questions is monotonic, in the sense that adding another 1 into the position does not decrease the optimal number of questions.

Related Question