[Math] Two seemingly unrelated puzzles have very similar solutions; what’s the connection

graph theorygroup-theorypuzzle

I think it's an interesting coincidence that the locker puzzle and this puzzle about duplicate array entries (see problem 6b) have such similar solutions. Spoiler alert! Don't read on if you want to solve these puzzles yourself first (they're two of the best puzzles I've ever seen).

In both solutions we consider a collection of labeled boxes, each with a number inside, and then "traverse through boxes" by starting at a given box and using the number inside to decide where to go next. For example, we might start at box $1$, find the number $5$ inside, proceed to box $5$, find the number $2$ inside, proceed to box $2$, and so forth.

Furthermore, in both solutions we "traverse through boxes" for the same reason: we're interested in finding cycles. More specifically, for the locker puzzle, we're interested in the question, "If we start at box $n$, how many steps does it take to get back to box $n$?", and for the duplicate array puzzle, we're interested in the question, "Does there exist $n$ such that (1) if we start at box $1$, we will eventually get to box $n$, and (2) if we start at box $n$, we will eventually get back to box $n$?"

Since the two puzzles seem quite unrelated at first glance, I'll pose the following question:

Is there a deep reason why "traversing through boxes" (described in the 2nd paragraph above) shows up in the solution for both of these puzzles?

In addition,

Are there other interesting problems for which "traversing through boxes", whether to find cycles or for any other reason, shows up in the solution?

[Edit] I mistakenly said that for the locker puzzle, we're interested in the question, "If we start at box $n$, will we eventually get back to box $n$?" Instead, the question should be, "If we start at box $n$, how many steps does it take to get back to box $n$?"

[Edit] Thanks for all the great answers so far! However, I'm still not completely convinced there's nothing more going on here…

Best Answer

I'm not sure of a deep connection; from what I gather, it is merely the case that the locker puzzle and problem 6 have a common subproblem, which is: prove that "traversing through boxes" produces a cycle, i.e., leads us to visit a node we have already visited. In the locker puzzle, there is an extra restriction that this cycle include the starting node. This restriction holds because the locker puzzle involves a permutation, while problem 6 does not necessarily involve a permutation.

Unless I'm missing something, problems 6a and 6b can be solved simultaneously using the "traverse through boxes" technique starting at array index 1, despite the comment that "Part (b) seems to be a lot harder than part (a), and it apparently requires entirely different techniques". Since the hedge words "seems" and "apparently" were used, we can't say that the comment is false, but nonetheless, I believe it would be false without those hedge words. The key idea is that, as explained in the PDF on the locker puzzle, "player $ B_i $ never opens a locker (other than locker i) without first finding its number, so each time he opens a new locker he must find either his own number or the number of another unopened locker". The same basic reasoning applies to problem 6, with just slight modifications.

Problem 6 can be made to look a little more like the locker puzzle by taking a[1] out of the picture. We can consider b = {a[2], ... , a[n]} as the "real" array with m = n - 1 elements, and the value a[1] as the index of the start node. We can further relabel {2, ... , n} as {1, ... , m} if desired.

All we care about is finding a cycle, that is, coming across a value that is an index we have already visited. This is equivalent to finding a duplicate in array a, since the indices we visited are precisely array values in a. We can fulfill the time and memory requirements using a boolean array of size m. The boolean array allows us to see if we have visited an index yet, and of course has constant memory and is fast. (Correction: the boolean array would have linear memory, not constant. Evidently there is a better way to solve this.) We are guaranteed not to visit any array index twice, so the algorithm has linear time. If we didn't think of a boolean array, we might be tempted to use some sort of a list structure with fast insert and find, but a boolean array is faster.

To tie the problems together further, we can interpret the data as a directed graph where each vertex has outdegree 1. In the locker puzzle, the indegree for each vertex is also 1, but problem 6 has no such restriction.

The outdegree being 1 guarantees that we can traverse from one vertex to another without any ambiguity. Beyond that, we need to guarantee a cycle will occur regardless of where we start.

A permutation is partitioned into cycles, so this allows us to conclude for the locker puzzle that we will find our cycle precisely when we get back to the starting node. For problem 6, we are guaranteed to find some cycle, because otherwise we would have to go on endlessly, but the array is finite, so that is absurd.

Maybe my answer should be made more rigorous, but I hope it is at least satisfactory and accurate. If not, let me know, and I'll try to fix it. I admit that after looking at this problem for an extended period of time, my mind is getting a bit foggy..

EDIT: Oh I think I figured out how to solve 6b with constant memory. You can follow n - 1 links, thus covering n (not necessarily distinct) array indices of a, and by this time you are guaranteed to be in the middle of a cycle. Note your current array index. Then go around until you see that index again, which will give you the period of the cycle. Then using this you can find the "entry node" of the cycle, which will be your duplicate. (Use modular arithmetic.) I haven't worked it out in detail but I believe that's an accurate sketch.

Related Question