[Math] Probability of multiple-choice answers in questions

probability

Assume a test in multiple-choice format taken by a student with no prior knowledge of the test subject and he is going to pick answers in a random way.

What is the probability of him getting 100% in the test?
What is the average probability of him getting atleast 40% in the test?
Given N number of students take the test(assuming all pick answers at random), what is the average score?
What effect do the variables number of questions, students and number of choices have in the overall average percentage of marks?
Do the results considerably vary across tests?
What would be a better strategy to get more marks – to pick answers at random or to select a single choice (such as A)

Assuming Q be the number of questions and Q, C be the number of choices and N the number of students and for this illustration let Q be 10, C be 3 and N be 1000, I arrived at the following

Question 1:

1/(C^Q). So 1/(3^10)

Question 2:

The probability of getting at least 40% is 1 minus probability of getting less than 4 questions correct. So 1-(1/(3^1+3^2+3^3))

Question 3:

1/C. I run a simulation program and got the results but can't mathematically prove or deduce it.

Question 4:

Deducing from 3 above, only C is going to have an impact on the average score irrespective of Q and N (assuming N to be greater)

Question 5:

Since only C is the major determinant of overall scores, the results aren't going to vary across tests

Question 6:

Randomly picking answers is better than picking a same choice. Again I ran a simulation and deduced it but cannot mathematically deduce it.

Am I right? How to mathematically deduce the answer for Question 3 and Question 6

Best Answer

Question 1

You're right, it is $1/C^Q$.

Question 3 By the law of large numbers, the average of the results obtained from a large number of trials should be close to the expected value. The expected value, from the other side, is obviously $\frac{1}{C}$ (as, for each question, there is $\frac{1}{C}$ probability of getting $1$ and $\frac{C-1}{C}$ probability of getting $0$).

Question 4 Changing $C$ directly changes the expected average score. The number of questions and students affects the dispersion of the "average score".

Question 5 Results may vary considerably across tests, although more the number of questions and students is, the less likely considerable differences are. For example, on the first test all students may accidentally guess all answers (which is possible, although unlikely); on the second test it is possible that none of the students will guess the right answer.

Question 6

As long as the correct answers are distributed uniformly, it doesn't matter which answer will you choose on a specific question, the probability of guessing is $\frac{1}{C}$. And, of course, the probability of getting the right answer for $n+1$-th question does not depend on what answer was chosen for $n$-th question. Both strategies are equivalent.

The results you got in your test run may be explained by e.g. non-uniform correct answers distributions (e.g. the second answer is correct for 50% of questions, while the first and the third answers are correct for 25% of questions) and checking against some unlucky answer (e.g. the first one). In such a case, choosing a random answer will give you 33.3% expected score, while always choosing the first answer will give you only 25% of expected score.

Related Solutions

[Math] Optimal passing score on a test/exam to ensure minimal influence of chance

If you were to determine a specific set of constraints, there is no reason why you couldn't use symplex method.

In other words, try to set up the problem as a set of linear equations depending on the passing scores of questions.

[Math] Probability of passing test by choosing answers randomly

The idea is to construct a probability distribution on the 10 K-Prim type questions. The probability distribution for a single question is $$\begin{align*} p_0 = \Pr[S = 0] &= \sum_{k=0}^2 \binom{4}{k} (0.5)^k (0.5)^{4-k} = \frac{11}{16}, \\ p_1 = \Pr[S = 0.5] &= \binom{4}{3} (0.5)^3 (0.5)^1 = \frac{1}{4}, \\ p_2 = \Pr[S = 1] &= \binom{4}{4} (0.5)^4 (0.5)^0 = \frac{1}{16}. \end{align*} $$ This assumes that for each such question, the choice of True/False is equally likely for each of the four answers, and the each answer is independent, thus the number of correct answers follows a ${\rm Binomial}(4,0.5)$ distribution.

Next, the distribution of the sum of the scores of 10 K-Prim questions can be derived from the multinomial distribution, though it is somewhat tedious to compute: let $X_0$, $X_1$, $X_2$ be random variables that count the number of $0$-point, $0.5$-point, and $1$-point scores out of the 10 questions. Then $$\Pr[(X_0, X_1, X_2) = (a,b,c)] = \frac{10!}{a! b! c!} p_0^{a} p_1^b p_2^c.$$ Then we can tabulate the sum; we do this in Mathematica:

Flatten[Table[{b/2 + c, PDF[MultinomialDistribution[10, {11/16, 1/4, 1/16}], 
        {10 - b - c, b, c}]}, {b, 0, 10}, {c, 0, 10 - b}], 1]

Table[{k, Total[Select[%, #[[1]] == k &]][[2]]}, {k, 0, 10, 1/2}]

which gives us the desired probability distribution for these 10 questions. Call this random variable $K$. Now, for the remaining 20 questions, the total point count is simple; it is simply $A \sim {\rm Binomial}(20, 0.2)$. So the probability that the total score is at least $18$ out of $30$ is $$\sum_{k=0}^{20} \Pr[K = k/2]\Pr[A \ge 18 - k/2].$$ Again, we use Mathematica:

Sum[%[[k, 2]] (1 - CDF[BinomialDistribution[20, 1/5], 18 - k/2]),
    {k, 1, Length[%]}]

This gives us $$\frac{8327843221553613}{2^9 \cdot 10^{20}} \approx 1.62653 \times 10^{-7}.$$ This is so small that it is unlikely that a naive simulation approach will be able to approximate it.

Best Answer

Related Solutions

[Math] Optimal passing score on a test/exam to ensure minimal influence of chance

[Math] Probability of passing test by choosing answers randomly

Related Question