[Math] Ulam’s problem – guessing a chosen number in a set

algorithmsdiscrete mathematicsrecurrence-relationsrecursive algorithms

I tried to solve the following problem, which I found in the book "Discrete Mathematics and Its Applications", by Kenneth Rosen (Problem 28 of the section 7.3 of the 6th Edition):

Suppose someone picks a number $x$ from a set of $n$
numbers. A second person tries to guess the number
by successively selecting subsets of the $n$ numbers and
asking the first person whether $x$ is in each set. The
first person answers either “yes” or “no.” When the first
person answers each query truthfully, we can find $x$
using $\log n$ queries by successively splitting the sets
used in each query in half. Ulam’s problem, proposed by
Stanislaw Ulam in 1976, asks for the number of queries
required to find $x$, supposing that the first person is allowed
to lie exactly once.

a) Show that by asking each question twice, given a number
$x$ and a set with $n$ elements, and asking one more
question when we find the lie, Ulam’s problem can be
solved using $2 \log n + 1$ queries.

b) Show that by dividing the initial set of $n$ elements into
four parts, each with $n/4$ elements, $1/4$ of the elements
can be eliminated using two queries. [Hint: Use two
queries, where each of the queries asks whether the
element is in the union of two of the subsets with $n/4$
elements and where one of the subsets of $n/4$ elements
is used in both queries.]

c) Show from part (b) that if $f (n)$ equals the number
of queries used to solve Ulam’s problem using the
method from part (b) and $n$ is divisible by 4, then
$f (n) = f(3n/4) + 2$.

d) Solve the recurrence relation in part (c) for $f (n)$.

e) Is the naive way to solve Ulam’s problem by asking
each question twice or the divide-and-conquer
method based on part (b) more efficient?

(Note: here, $\log n$ stands for the base-2 logarithm of $n$.)

I will post the whole solution as an answer to my own question, because I want to know if the solution is completely correct and consistent. Thank you in advance.

Best Answer

My solution to this problem:

a) Show that by asking each question twice, given a number $x$ and a set with $n$ elements, and asking one more question when we find the lie, Ulam’s problem can be solved using $2 \log n + 1$ queries.

I will use the simplifying assumption that $n$ is a power of two; that is, $n=2^k$ for some nonnegative integer $k$.

I will successively split the sets in half, and I will ask if $x$ is in each half. Because I split each set in half, for $n=2^k$ a total of $\log n$ splits will be made in the whole process. I know that the first person is allowed to lie exactly once. Then, if I make the same question twice for the same half, I know that the first person lied if he gives different answers to both questions. In this case, if I make one more question, I will know for sure in which half $x$ is. But I will only need to make this extra question at most once in the whole process, because there can be only one lie at most. Thus, in the worst case, I will make $2\log n + 1$ questions (two for each subset and one extra question when I find a lie).

b) Show that by dividing the initial set of $n$ elements into four parts, each with $n/4$ elements, $1/4$ of the elements can be eliminated using two queries. [Hint: Use two queries, where each of the queries asks whether the element is in the union of two of the subsets with $n/4$ elements and where one of the subsets of $n/4$ elements is used in both queries.]

Following the hint, suppose that I divide the initial sets into four subsets with $n/4$ elements: $A$, $B$, $C$ and $D$. Without loss of generality, I make two questions: (1) whether the element $x$ is in $A\cup B$ and (2) whether the element $x$ is in $A\cup C$. There are four possibilities for the answer to these questions:

  • If yes is answered to both questions, then $x$ must be in $A\cup B$ or $A\cup C$ (even if only one answer is truthful). Thus, I can be certain that $x$ is not in $D$. Therefore, $D$ can be eliminated using two queries.

  • If no is answered to both questions, then $x$ is not in $A\cup B$ or not in $A\cup C$. Thus, it is certainly not in $A$, so $A$ can be eliminated.

  • If yes is answered to question (1) and no is answered to question (2), with at most one lie, several possibilities arise. Analyzing the possibilities below, we conclude that $C$ can always be eliminated in this case:

    1. If both answers are truthful, then $x$ must not be in $A\cup C$; thus, $A$ or $C$ could be eliminated;

    2. If the first answer is a lie, then $x$ is not in $A\cup B$ and not in $A\cup C$; therefore, $A$, $B$ and $C$ could be eliminated;

    3. If the second answer is a lie, then $x$ is in $A$; therefore, $B$, $C$ or $D$ could be eliminated.

  • If no is answered to question (1) and yes is answered to question (2), by an analogous reasoning, $B$ can always be eliminated in this case.

Therefore, it is always possible to eliminate one subset of $n/4$ elements by using two queries.

c) Show from part (b) that if $f (n)$ equals the number of queries used to solve Ulam’s problem using the method from part (b) and $n$ is divisible by 4, then $f (n) = f(3n/4) + 2$.

By part (b), if $n$ is divisible by four, I can solve this problem by asking 2 questions to eliminate $n/4$ elements of the initial set, and then ask $f(3n/4)$ questions for the rest of the $3n/4$ elements. Therefore:

$$f (n) = 2 + f(3n/4)$$

d) Solve the recurrence relation in part (c) for $f (n)$.

(Note: I edited out a previous solution that was incorrect, as pointed out in the comments.)

Here, we are assuming that $n$ is a power of four 4/3. Successively applying the recurrence relation:

$$\begin{align*} f(n)&=f(3n/4)+2\\ &=(f((3/4)^2n)+2)+2\\ &=f((3/4)^3n)+3\times 2\\ &=\cdots\\ &=f(1)+2k \end{align*}$$

$k$ is the number of times that $n$ is multiplied by $3/4$ (and therefore divided by $4/3$) until we reach the base case, $f(1)$. Therefore, $k=\log_{4/3} n$. Thus:

$$f(n)=f(1)+2\log_{4/3}n=2\log_{4/3}n$$

Here, $f(1)=0$, because, if the list has only one element, I don't need to ask any question.

e) Is the naive way to solve Ulam’s problem by asking each question twice or the divide-and-conquer method based on part (b) more efficient?

The naive way, given in item (a), uses $2 \log n + 1$ queries. The divide-and-conquer method uses $2\log_{4/3}n$ queries. Therefore, it seems that the first method is more efficient, because it reduces the original problem to half, whereas the second method reduces it to only $3/4$ of the original problem.

Related Question