Variant of Binary Search Algorithm (Substring Search)

binarycombinatoricsresearch

I've recently tried exploring my own variant of a standard binary search. We search for a particular binary string from the set of all binary strings of length $k$. We can ask if the element has any particular substring, and the goal is to guess the chosen element using the fewest questions possible.

For example: Consider {$0,1$}$^3$ =
{$000$, $001$, $010$, $011$, $100$, $101$, $110$, $111$}. We can ask if the element has the substring $01$, and $4$ of them would, and so forth.

I am trying to prove that the best move at any possible turn is the one that halves the search set (although I'm still not sure if this conjecture is correct). I can't find a slick way to do it.

On that note, I can't seem to to find any pattern as to what the best question to ask is given any value $k$. Through lots of trial and error (and Python), I've found the best question to ask for small values of $k$ (up to 5). I know that I can set up recurrence relations and determine the number of binary strings of any particular length that have an arbitary substring, but I'm not sure how to proceed from there.

My final goal is to prove that my variant of the game still takes O(log n) time (just like the regular binary search). Any thoughts?

Best Answer

Not quite sure if this answers your question.

An algorithm using about $2k$ questions is the following:

At step $i$, assume that we already know that the answer contains a substring $s$ of length $i$.

We ask the string $s0$. If it appears, then we can already proceed to the next step. Otherwise, ask the string $s1$. If it appears, then we can also proceed to the next step.

In case neither of them appear, we know that $s$ must appear at the end of the answer, and all subsequent steps may just ask for $0s$.

Therefore, at most $2$ questions are needed to proceed one step, and we are done at step $k$.

This is probably not optimal, but it's already $O(k)$, i.e. $O(\log n)$ for $n=2^k$.

Related Question