Probability – Rigged Lottery Selection Probability

lotteriesprobability

I participated in a lottery game, with $N$ different participants and $k$ prizes. Participants were random selected, and each one can only win one prize: after a person is selected they are eliminated from the lottery (i.e., no reinsertion).

The game was run a first time, let's call it $game1$.

After the winners were selected, organizers found out there were some fake participants (some people submitted their application multiple times), say this is $m$. Some of them even won a prize, say this is $j$.

Organizers decided to remove the $m$ cheaters, the $k-j$ legit winners, and select $j$ new winners. Let's call this game $game2$

The parameters for $game2$ were:

$N^{'} = N – m – k + j$

$k^{'} = j$

If the organizers instead decided to run the game from scratch after eliminating cheaters, in a hypothetical game named $game3$, we would have:

$N^{''} = N – m$

$k^{''} = k$

Given that, I calculated:

  • the probability of winning $game1$, $P^{1}_{win}$
  • the probability of winning $game2$, $P^{2}_{win}$
  • the probability of winning $game3$, $P^{3}_{win}$
  • the probability of winning the second game, not having won the first, $P^{'}_{win}$

My questions are:

  1. would the solution of “incrementally” re-running the game (i.e., $game2$ after $game1$) exactly the same as $P^{3}_{win}$?
  2. is the overall process fair for everyone? In other words, could there be at least one person who had a different probability of winning depending on being selected or not selected in the first round ($game1$)?

My reasoning:

In this kind of lottery you need to calculate the probability of being selected, $P_{win}$, considering that, at each extraction, the number of participants goes down at each extraction, as the number of remaining prizes. So you have

$P_{win}=\frac{1}{N}+\frac{1}{N-1}\left( 1-\frac{1}{N}\right)+\frac{1}{N-2}\left\{ 1-\left[ \frac{1}{N}+\frac{1}{N-1}\left( 1-\frac{1}{N} \right) \right] \right\}+…$
(k times)

We can use that formula to calculate $P^{1}_{win}$, $P^{2}_{win}$, and $P^{3}_{win}$.

As per $P^{'}_{win}$ it should be:

$P^{'}_{win}=\left( 1 – P^{1}_{win} \right)P^{2}_{win}$

So, assuming I am not mistaken, I would like to validate the following answers:

  1. $P^{3}_{win}$ and $P^{'}_{win}$ are different
  2. This is were I am undecided. I have the feeling that people who won $game1$ had a lesser chance to be selected than people who ran for $game1$, were not selected, and then ran for $game2$ (so the process was not fair for the winners of $game1$). But the overall process is "fair" because everyone was in the same situation (i.e., could win $game1$ OR $game2$)

Best Answer

This is a surprisingly tricky question! It is tricky, because turning your question ("is this fair?") into a mathematical question about probability and statistics is quite subtle, and requires making assumptions about what the organizers would have done in other situations.

The short answer is, it's not possible to answer whether it is fair, because you can't know what the organizer would have done if circumstances had turned out differently. What counts as "fair" is non-trivial to define in a precise mathematical way. I would argue that fairness is a property of the procedure or process that the organizers use, not just what they did in one instance. You don't know what the organizers would have done in a counterfactual setting where things turned out differently, so you can't know whether it is fair.

For instance, suppose that the organizers were malicious, and behave as follows: if their friend Bob is one of the winners in Game 1, they remove the cheaters, keep the non-cheating winners, and rerun for everyone else (i.e., Game 2); whereas if their friend Bob is not one of the winners in Game 1, they rerun everything (Game 3). In this case, then it clearly wouldn't be fair: Bob would have an elevated chance of winning, and everyone else would have a decreased chance of winning. Call this Scenario A.

On the other hand, suppose that the organizers behave as follows in all possible scenarios: if cheaters are detected in Game 1, then they always keep the winners, remove the cheaters, and re-run on everyone else (Game 2). In that case, it is fair. The probability of winning in Game 1+Game 2 is the same as for Game 3, as explained in Rezha Adrian Tanuharja's answer. Call this Scenario B.

Now, you tell us that you observed a situation where some people won in Game 1, cheaters were detected, and the organizers decided to keep the winners, remove the cheaters, and re-run the rest (Game 2). Can we tell, from that fact alone, whether everything is fair? No, we can't. Both Scenario A and B are consistent with the observed data. We don't know whether one of the winners from Game 1 might have been one of the organizers' friends, and we don't know what would have happened if none of the organizers' friends had won Game 1.

Of course if the organizers had decided to rerun everything (Game 3), we can make the same objection: we have no way of knowing whether the organizers would have done in that in every possible case, or if they were just biased against the people who happened to win Game 1 in this one iteration.

So, it's not answerable whether the process is fair, and it's not obvious what is the fair thing to do in this situation.

Arguably, what the organizers did is reasonable. It would be reasonable for observers to always object to rerunning everything from scratch (Game 3) and never object to rerunning just the non-winners (Game 2); if everyone agreed to do that, that would ensure that the organizers can never get away with something unfair. Unfortunately, it would also be reasonable for observers to always object to rerunning just the non-winners (Game 2) and never object to rerunning everything from scratch (Game 3); if everyone agreed to do that, that would also ensure that the organizers can never get away with something unfair. There is no obvious mathematical way to specify which of these two policies observers should adapt. As a result, in practice, there could easily be some people who follow the first policy, and some people who follow the second policy, leaving observers in a pickle where nothing satisfies everyone.


How does this connect to probability theory? One way to look at this is that there is a difference between $\Pr[\text{win}|E_j]$ vs $\Pr[\text{win}]$, where here $E$ is the event that exactly $j$ of the cheaters won Game 1, and $\text{win}$ is the probability that you are one of the winners at the end. We have

$$\Pr[\text{win}] = \sum_{j=0}^{\min(m,k)} \Pr[\text{win}|E_j] \Pr[E_j].$$

Let's say that in universe $U_{12}$, we always run Game 1, then discard all cheaters, and run Game 2 (keep the winners from Game 1 and re-run with whoever remains), while in universe $U_{3}$, we always run Game 1, then discard all cheaters, and run Game 3 (rerun everything from scratch).

As Rezha Adrian Tanuharja's answer shows, we have

$$\Pr_{U_{12}}[\text{win}] = \Pr_{U_{3}}[\text{win}].$$

In other words,

$$\sum_{j=0}^{\min(m,k)} \Pr_{U_{12}}[\text{win}|E_j] \Pr_{U_{12}}[E_j] = \sum_{j=0}^{\min(m,k)} \Pr_{U_{3}}[\text{win}|E_j] \Pr_{U_{3}}[E_j].$$

Also, we have $\Pr_{U_{12}}[E_j] = \Pr_{U_{3}}[E_j]$. However, we do not have $\Pr_{U_{12}}[\text{win}|E_j] = \Pr_{U_{3}}[\text{win}|E_j]$ -- those two probabilities are different.

So we get a different answer to your question, depending whether we interpret it as asking "is discarding the cheaters and re-running the non-winners (assuming we always do that in all circumstances) equivalent to re-running everything from scratch?", or as asking "given that we observed $j$ of the cheaters won in Game 1, is discarding the cheaters and re-running the non-winners equivalent to re-running everything from scratch?". The answer to the first question is Yes, and the answer to the second question is No.

I hope this illustrates how this is a tricky situation, and the exact answer depends on exactly what question you are asking. Conditional probabilities are tricky to reason about.