Probability of guessing a secret code

probability

I recently talked to a friend about this problem:

Someone is trying to brute-force a secret PIN code, consisting of 4 digits (0-9).

  • There are actually 10 distinct codes that are acceptable, and guessing any of these would do.

  • The person trying to guess has a maximum of 5 tries.

What is the probability that they would guess a correct code?

Now, my thinking goes like this: For an attempt there's a $\frac{10}{10^4} = 0.1\%$ chance of success and since there are 5 tries, a total chance of 0.5% would be a good baseline, but the actual odds should be slightly more, because every failure to guess decreases the pool of possible codes.

So, computed the probability to guess as follows:

Attempt #   Correct codes left    Total codes      p(right)
1                 10                10000          0.1            %
2                 10                 9999          0.1000100010001%
3                 10                 9998          0.1000200040008%
4                 10                 9997          0.1000300090027%
5                 10                 9996          0.1000400160064%

                                      SUM          0.5001000300100%

The probability to guess in max 5 attempts is the sum of the probabilities of guessing in any of the 5 attempts.
So I got the result of $0.5001000300100\%$, which matches my intuition.


However, my friend suggested another approach: let's compute the probability of being wrong on all the 5 attempts ($p(all\ 5\ wrong)$). The probability to guess must be $p(guess\ in\ 5) = 1 – p(all\ 5\ wrong)$.
So, we computed this as follows:

Attempt    Wrong codes left       Total codes      p(wrong)
1                9990               10000          99.9            %
2                9989                9999          99.8999899989999%
3                9988                9998          99.8999799959992%
4                9987                9997          99.8999699909973%
5                9986                9996          99.8999599839936%

Now, we figured that

$$p(all\ 5\ wrong) = \prod\limits_{i\le 5} p(wrong_i) = 99.5008993700451\%$$

That means that

$$p(guess\ in\ 5) = 1 – p(all\ 5\ wrong) = 1 – 99.5008993700451\% = 0.4991006299549\%$$

which is a surprising result, because it's less than the baseline $0.5\%$.

I feel like the second approach is wrong, but cannot figure out why.

Best Answer

The second approach is right and the first one has a mistake.

Here's a suggestion for how you can see the mistake: what if everything were the same about this problem, except that there were just 16 total codes (10 of which were valid) and you had 5 guesses to find them? What would happen if you repeated your first approach in the exact same way?


EDIT: Let me try to go a bit further than just convincing you that the first one is wrong; here's a bit on why it's wrong.

The short answer for what's broken is that you're overcounting probabilities. When you add probabilities, you're implicitly assuming that the underlying events are disjoint -- that is, they can't occur together. However, that's not the case here. It's quite possible to guess correctly on attempt 1 and to guess correctly on attempt 2.

If you want to proceed in this fashion, you need to restyle your events:

Event 1: You guess right on your first try (probability: $10/10000$)

Event 2: You don't guess right on your first try, but you do on your second (probability: $\frac{9990}{10000} \cdot \frac{10}{9999}$)

Event 3: You don't guess right on your first two tries, but you do on your third (probability: $\frac{9990}{10000} \cdot \frac{9989}{9999} \cdot \frac{10}{9998}$)

etc. By doing this, you'll have constructed disjoint events, which means it's appropriate to add their probabilities. You will then recover the other answer.

Related Question