Probability – Probability of Guessing Correct Password on k-th Try

combinatoricsprobability

I am currently reading the course notes on probability theory from Stanford CS109. In order to practice a bit and get used to the topics presented, I am also trying to solve the problem sets, but I came across one which I find a bit ambiguous. It's problem 11a from Problem Set 1:

Say a hacker has a list of n distinct password candidates, only one of which will successfully log her into a secure system.

a. If she tries passwords from the list at random, deleting those passwords that do not work,
what is the probability that her first successful login will be (exactly) on her k-th try?

We suppose that k is between 1 and n (inclusive), otherwise the problem is trivial.

One way of thinking about the problem would be the following: the probability of being successful on the k-th try is the product of the probabilities of choosing a wrong password on the first k – 1 steps multiplied by the probability of being successful on the k-th step, which is:

$$ p = \frac{n – 1}{n} \frac{n – 2}{n – 1} \cdots \frac{n – k + 1}{n – k + 2} \frac{1}{n – k + 1} = \frac{1}{n} $$

Another way would be to think of all the n! permutations of the n passwords and see every permutation as the order in which the hacker will try the passwords. We are interested in finding the number of permutations that have, let's say, element x (the correct password) on the k-th position. There are (n – 1)! such permutations, therefore the answer will be still $ \frac{1}{n} $. There is one problem with this approach: Let's suppose there are 3 passwords: $p_1$, $p_2$ and $p_3$. There are 6 possible permutations of these 3 passwords:

$$ p_1, p_2, p_3, \\
p_1, p_3, p_2, \\
p_2, p_1, p_3, \\
p_2, p_3, p_1, \\
p_3, p_1, p_2, \\
p_3, p_2, p_1, $$
so if we want to find the probability of guessing the correct password from the first try, using the above formula we'd say it is $\frac{1}{3}$. However, as long as the hacker found the correct password from the first try, the first 2 permutations are actually equivalent (we are not interested in the order in which the hacker was supposed to try the next passwords – as long as she found the correct one, she stops – or at least this is how I interpret the problem), so the correct probability would actually be $\frac{1}{5}$.

Because of this reason, the first approach is also incorrect. My explanation for this would be the following: if we denote by $T_i$ the i-th trial and $T_k=1$ when the hacker guesses the correct password on the k-th try, what we want to find is
$$ P(T_0 = 0 \wedge T_1 = 0 \wedge \cdots \wedge T_{k-1} = 0 \wedge T_k = 1) ,$$
(where by $\wedge$ we denote the probability of the events occuring together) which would be equal to the product of the probabilities of each event only in the case when all the events are independent (and in our case they are not, as we stop trying passwords once we guess the correct one)

The final (and I suppose, correct) perspective is: the probability of guessing on the k-th try is represented by the number of ways in which the hacker can guess on the k-th try divided by the total number of ways in which the hacker can guess the password in general (which is the sum of ways in which the hacker can guess the password on the i-th try, with i from 1 to n). There are:
$$ (n-1)(n-2)\cdots(n-k+1) = \prod_{i = 1}^{k – 1} (n-i), \text{if k > 1} $$
$$ 1, \text{if k = 1} $$
ways in which she can guess the correct password on the k-th try, so the result would be:
$$ \frac{\prod_{i = 1}^{k – 1} (n-i)}{1 + \sum_{j=2}^{j=n} \prod_{i = 1}^{i = j – 1} (n-i) } $$

My question is: is my reasoning correct and if not, where is the mistake ? The different perspectives are the result of a debate with someone, but we couldn't agree on the correct answer (although I am pretty sure that the last one is correct, but there might be something I am missing)

Thank you!

Best Answer

  1. First of all, only the first approach is correct ($1/n$)

  2. the thought behind the second approach is not correct

So if we want to find the probability of guessing the correct password from the first try, using the above formula we'd say it is $1/3$. However, as long as the hacker found the correct password from the first try, the first 2 permutations are actually equivalent (we are not interested in the order in which the hacker was supposed to try the next passwords - as long as she found the correct one, she stops - or at least this is how I interpret the problem), so the correct probability would actually be $1/5$.

If we are only considering up to the first try, actually row 1 and row 2 are the same, row3 and row 4 are the same, and row5 and row6 are the same. So the probability of getting the password correct in the first try is still 1/3, but not 1/5

  1. The logic in your third approach is not correct neither, because not all tries have the same weight. (kind of similar mistake on your thoughts behind the 2nd approach)
Related Question