[Math] Election: probability of A is in the lead from the first vote on

combinatoricsprobability theory

I was trying to solve the following problem:

In an election, candidate A receives n votes and candidate B receives m votes, where n > m. Assume that in the count of the votes all possible orderings of the n + m votes are equally likely. Let $P_{n, m}$ denote the probability that from the first vote on A is always in the lead. Find

(a – i) $P_{2, 1}$, $P_{3, 1}$, $P_{n, 1}$, $P_{3, 2}$, $P_{4, 2}$, $P_{n, 2}$, $P_{4, 3}$, $P_{5, 3}$, $P_{5, 4}$

(j) Make a conjecture as to the value of $P_{n, m}$.

I solved all parts and my conjecture was that $P_{n, m} = \frac{n – m}{n + m}$ which I sort of verified with a small app I wrote to check my solutions (JavaScript: https://repl.it/EWNM/35).

While I was working on the exercises (a – i) I kept coming up with the formula $P_{n, m} = x * \frac{n!m!}{(n+m)!}$. The $\frac{n!m!}{(n + m)!}$ part is the probability of any one of the outcomes happening. The $x$ would be the number of outcomes where A leads from the first vote on. I was trying to come up with the formula for $x$ even before reaching part (j) but I could not. After figuring out the probability $P_{n, m} = \frac{n – m}{n + m}$, I could solve for $x$ and got $x = \frac{(n – m)(n + m – 1)!}{n!m!}$.

How could I have come up with $x$ without knowing $P_{n, m} = \frac{n – m}{n + m}$?

So, instead of thinking about probabilities, how many different orderings of A and B is possible where the number of A's at any position is more then the number of B's?

Is there some kind of general formula or way of thinking about this kind of problems? What would be the solution for A always leading by 2, 3, …, i?

Thanks,
Norbert

Best Answer

Here's a more detailed account of the reflection argument than the one in the Wikipedia article. Let me know if there's any part of it that you still want me to expand on.

Let's consider all possible sequences of ballots. The idea of the proof is to divide them into three categories:

  1. Sequences where $A$ is always ahead
  2. Sequences where $A$ starts out ahead (i.e., gets the first vote), but $B$ catches up at some point
  3. Sequences where $B$ starts out ahead

For concreteness, let's say $n=3$, $m=2$. Then the categories split up like this:

  1. AABAB, AAABB
  2. AABBA (B catches up after ballot #4), ABAAB (B catches up after ballot #2), ABABA (B catches up after ballot #2), ABBAA (B catches up after ballot #2)
  3. BAAAB, BAABA, BABAA, BBAAA

Notice that category 2 and category 3 are the same size. This is not a coincidence! The "reflection" idea is to pair off sequences from category 2 with sequences from category 3.

To do this, we first comment that for any sequence in either category, the count will be exactly tied at some point. For category 2 sequences, this is by definition; if $B$ has caught up to $A$, then $A$ and $B$ are tied. For category 3 sequences, it's because $B$ starts out ahead, but $A$ must eventually win. So at some point before winning, $A$ must catch up to $B$ and they will both be tied.

For concreteness, let's focus on the last time each of these sequences is tied. In our example set, I'll write a | to indicate this point. So the two categories are:

  1. AABB|A, AB|AAB, ABAB|A, ABBA|A
  2. BA|AAB, BAAB|A, BABA|A, BBAA|A

Next, we'll look at what happens if we flip all the votes before a |. For example, AABB|A would flip into BBAA|A. Notice that:

  • We can always do this and get another valid ballot sequence — because the | denotes a point where the count is tied, flipping all the votes doesn't change the final total.
  • This takes sequences from category 2 (which start with an A) to sequences in category 3 (which start with a B), and vice versa.
  • If you do it twice, you get back to the sequence you started with.

This means it gives a way of pairing off sequences from category 2 with sequences from category 3, just as we wanted. In our specific example, this pairing is:

  • AABB|A <=> BBAA|A
  • AB|AAB <=> BA|AAB
  • ABAB|A <=> BABA|A
  • ABBA|A <=> BAAB|A

Now, we're almost done. We just need one more observation: category 3 just consists of sequences which start with a B. Since $m$ out of $n+m$ ballots are $B$ ballots, that means the probability that you end up in category 3 is $\frac{m}{n+m}$. But category 2 is the same size as category 3! Since all ballot sequences are equally likely, that means the probability that you end up in category 2 is also $\frac{m}{n+m}$. So the probability that you end up in category 1 is $$ 1-2\frac{m}{n+m}=\frac{n+m-2m}{n+m}=\frac{n-m}{n+m} $$ just as you conjectured.

Related Question