Optimal play in “Are You the One?”

combinatoricsprobability

I recently learned of a dating show called Are You the One?

A description of the show, for those interested.

The premise of the show is as follows: there are $10$ men and $10$ women. The producers of the show have decided in advance on a perfect matching (apparently using some kind of stable matching algorithm); that is, each man has exactly one "correct" woman out of the $10$ and each woman has exactly one correct man. The goal of the contestants is to determine all correct pairs over the course of the show. If they all guess correctly, then the show ends and the contestants receive a monetary prize.

Each round/episode proceeds as follows: at some point, one pair (determined by a competition) gets the chance to use the "truth booth" and checks whether or not they are a "correct" pairing. At the end of the episode, the contestants pair off in a "matching ceremony" to guess their correct matches, and then (if they have not guessed all pairs correctly) they are told how many of these pairs are correct (but not which of the matches are correct). The contestants have $10$ rounds to win.

To make things as lucid as possible, here is the simplified version of the game in which I am interested.

There are $10$ men and $10$ women who have been "paired off". There is a guesser, whose goal is to correctly determine all pairs at the end of a round. Each round of play has two parts. In the first part, the guesser chooses a man and woman and is told if this is one of the "correct" pairs. In the second part of the round, the guesser pairs off all men and women as he sees fit and is told the number of pairs that are correct.

The Question:

Is there a strategy that does better than random chance? Is there an optimal strategy? What is the highest probability of victory that you can guarantee?

An informal, more specific version of this question: in the unlikely event that the guesser finds a lot of correct matches (e.g. if they get $4$ or more matches, which is an event with probability approximately $1.9\%$), how can he use this information to the greatest effect?

Any ideas or interesting observations about the problem are appreciated!


Thoughts so far: Let $n$ denote the number of pairs.

One question that I've already answered is that of the probability of the outcomes of the matching ceremony. Guessing an overall matching is tantamount to guessing one of $n!$ permutations. The number of guesses in which $k$ pairs are correct is given by $D_{n,k}$, where $D_{n,k}$ denotes a Rencontre number. So, the probability of getting $k$ correct pairs in the ceremony is equal to $\frac{D_{n,k}}{n!}$. For $n=10$, the numerators are as follows.
$$
\begin{array}{c|cccccccccc}
k&0&1&2&3&4&5&6&7&8&9&10\\
\hline
D_{10,k} & 1334961 & 1334960 & 667485 & 222480 & 55650 & 11088 & 1890 & 240 & 45 & 0 & 1
\end{array}
$$

I'm not really sure where to go from there. In the unlikely event that $4$ or more matches

Best Answer

We can heuristically estimate the answer by looking at the amount of information we can obtain.

Consider the first round. If we choose the single pair uniformly randomly, it would give roughly $-\frac19·\log_2(\frac19) ≈ 0.35$ bits. Separately, if we choose a uniformly random $10$-pair combination, the expected number of correct matches is $10/9$, and the number of matches is approximately a binomial distribution with mean $10/9$ we would learn roughly $\sum_{k=0}^{10} (-p_k·\log_2(p_k)) ≈ 1.9$ bits where $p_k = \binom{10}{k}·(\frac19)^k·(\frac89)^{10-k}$ for each $k∈[0..10]$. This approximation is relatively reasonable because the correlations only affect the cases with many matches, which have very low probability and hence contribute very little to the sum. If we repeat this, then there is some overlap in the information gained, but the overlap would be small as long as the total information is very far from the needed amount.

The total number of bits needed is $\log_2(10!) ≈ 21.8$ Thus after 8 rounds the total information gained is roughly $8·(0.35+1.9) ≈ 18$ bits. This is more than a factor of $10$ off from $21.7$ bits, so the estimate should be reasonable. We now have on 'average' less than $2^4$ possibilities to distinguish in $2$ more rounds. Given the small number of possibilities, we should be able to choose the single pair wisely to divide the remaining cases as evenly as possible, say at least as even as $2:3$, which yields half a bit each time. For the $10$-pair combinations we can fix those pairs that we already know, and choose uniformly randomly for the rest, which would yield roughly one bit each (intuitive estimate). Thus the final $2$ rounds yields roughly $3$ more bits.

At the end of this procedure, we should have roughly $21$ bits of information, and therefore can guess correctly with probability roughly $2^{-.8} > 0.5$. This is a 50% success rate! Now, since there are definitely better algorithms, the answer is almost surely about 70% or higher, but as I mentioned in the comments it should be completely intractable to compute the optimal success rate exactly.

−−−

I was curious to see how far off my estimate was so I implemented the first phase of the algorithm (i.e. $n-2$ rounds with uniformly random single-pair and uniformly random $n$-pair combinations) to see how many possible cases are left.

On doing $20$ runs with $n=10$, the number of remaining cases was:

5,23,49,377,361,21,190,131,1,8,
73,28,58,8,24,230,3,12,54,548

On average $5.2$ bits of information are needed to distinguish the remaining cases. So the first phase only yielded $≈ 21.8-5.2 = 16.6$ bits, off from my estimate of $18$ bits by $1.4$ bits. I guess that's not bad for such a ridiculously handwaving estimate. But this also implies that my claim that we can split the cases quite evenly in the second phase would be wrong, and furthermore the very heavy-tailed distribution of the remaining number of cases was a bit unexpected.

I think we can in fact get at least $3$ bits from the final phase, so assuming we can split the remaining cases into $2^3 = 8$ equal parts, and then guess randomly between the cases in whichever part we end up in, the above data suggests that the total success probability of this procedure would be about 40%. So despite the very bad heuristic estimate, the claimed success rate was not that far off in the end.

In fact, it turns out that if we simply have only the first phase, namely we randomly choose everything and do not even try to optimize near the end, on $20$ runs the remaining cases at the end were:

14,1,1,5,10,1,3,1,2,2,
9,27,2,188,2,2,2,80,1,1

And so the probability of guessing correctly is roughly 49%. This in fact is very strong evidence that we can optimize the final phase in my original algorithm to get roughly 65% success probability (if we manage to reduce each $2$ to a $1$)!

Anyway, here is the Javascript code for anyone who wants to play with it (m is the number of rounds in the first phase):

var n=10;
var m=8;
var t=[];
var c=[];
function gen()
{
    for(k=0;k<m;k++)
    {
        t[k]=[];
        for(a=0;a<n;a++) t[k][a]=a;
        for(a=1;a<n;a++)
        {
            s=Math.floor(Math.random()*(a+1));
            if( s<a ) { var u=t[k][a];t[k][a]=t[k][s];t[k][s]=u; }
        }
        c[k]=0;
        for(a=0;a<n;a++) c[k]+=(t[k][a]==a?1:0);
    }
    for(k=m;k<m*2;k++)
    {
        t[k]=[];
        for(a=0;a<n;a++) t[k][a]="·";
        s=Math.floor(Math.random()*n);
        d=Math.floor(Math.random()*n);
        t[k][s]=d;
        c[k]=0;
        for(a=0;a<n;a++) c[k]+=(t[k][a]==a?1:0);
    }
    for(k=0;k<m*2;k++) console.log(t[k]+" : "+c[k]);
}
function solve()
{
    var v=[];
    var r=[],rc;
    for(a=0;a<n;a++) r[a]=a; rc=n;
    var count=0;
    function test(d)
    {
        if( d==n ) { console.log(v+""); count++; return; }
        for(var a=0;a<rc;a++)
        {
            var u=r[a];
            r[a]=r[rc-1];
            rc--;
            v[d]=u;
            var pass=true;
            for(var k=0;k<m*2;k++) if( u==t[k][d] ) { c[k]--; if( c[k]<0 ) pass=false; }
            for(var k=0;k<m*2;k++) if( c[k]>=n-d ) pass=false;
            if( pass ) test(d+1);
            for(var k=0;k<m*2;k++) if( u==t[k][d] ) c[k]++;
            rc++;
            r[a]=u;
        }
    }
    test(0);
    console.log(count+" solutions");
}
n=10;m=10;gen();solve();