[Math] Strategies to guess choices for multiple choice questions

probability

Multiple choice questions (MCQs) are common in examinations over here in Singapore. A set of, say, $40$ questions are given to students, and each is accompanied with a list of $4$ choices of answer, $A, B, C$ or $D$.

Suppose a student did not study for a particular test. Out of desperation, he decides to "guess" a choice for each question to secure at least some points. He considers two possible strategies:

  1. For each question, he picks a random choice as his answer.
  2. He picks a random choice, say, $B$ and uses it for all his answer.

Assuming the correct answers are randomly distributed and his guess is completely random, which strategy would give a higher probability of securing more correct answers than the other?

Intuitively, the second strategy would give a higher probability of securing more marks. However, I am unable to come up with a mathematical proof (or disproof).

For the second strategy, we only need consider the probability that the correct answer is the choice that was picked. Under the assumption that the correct answers are randomly distributed, the probability of a particular choice being the correct answer is $\frac{1}{4}$. Hence, the student would get approximately $25\%$ of his guesses correct.

However, we can also use the same argument for the first strategy to say that the student would also get approximately $25\%$ of his guesses correct. This would imply that both strategies are equally effective, but I am pretty sure the second strategy is more effective.


EDIT: In order to prevent psychological factors from distorting reality, I decided to write a program (in C#) that simulates the aforementioned MCQ tests. I configured the program to simulate the taking of $1000$ randomly generated MCQ tests with $40$ questions each using both strategies.

It turns out that the percentage scores for both strategies have the same average ($\approx25\%$) and the same standard deviation ($\approx 6.84$)!

Code:

static Random rng = new Random((int)DateTime.Now.Ticks);

static void GenerateAnswers(int[] answers)
{
    for (int i = 0; i < answers.Length; i++)
    {
        answers[i] = rng.Next(4);
    }
}

static int Strategy1(int[] answers)
{
    int score = 0;
    foreach (int answer in answers)
    {
        if (rng.Next(4) == answer)
        {
            score++;
        }
    }
    return score;
}

static int Strategy2(int[] answers)
{
    int choice = rng.Next(4);
    return answers.Count(x => x == choice);
}

Best Answer

Multiple choice questions (MCQs) are common in examinations over here in Singapore. A set of, say, $40$ questions are given to students, and each is accompanied with a list of $4$ choices of answer, $A, B, C$ or $D$.

Suppose a student did not study for a particular test. Out of desperation, he decides to "guess" a choice for each question to secure at least some points. He considers two possible strategies:

  1. For each question, he picks a random choice as his answer.
  2. He picks a random choice, say, $B$ and uses it for all his answer.

Assuming the correct answers are randomly distributed and his guess is completely random, which strategy would give a higher probability of securing more correct answers than the other?


The variable here is number of correct answers, i.e. let it be $X$.

1) $$\rm E[X]=\sum_{k=0}^{40} P(X=k)k=\sum_{k=0}^{40}\binom{40}k\left(\frac14\right)^k\left(\frac34\right)^{40-k}k=10$$ This is $25\%$. Now the standard deviation is: $$\rm \sigma[X]=\sqrt{\sum_{k=0}^{40}(k-10)^2\binom{40}k\left(\frac14\right)^k\left(\frac34\right)^{40-k}}=\sqrt{\frac{15}2}\approx2.73861$$


2) $$\rm E[X]=\sum_{k=0}^{40} P(X=k)k=\sum_{k=0}^{40}\binom{40}k\left(\frac14\right)^k\left(\frac34\right)^{40-k}k=10$$ This is $25\%$. Now the standard deviation is: $$\rm \sigma[X]=\sqrt{\sum_{k=0}^{40}(k-10)^2\binom{40}k\left(\frac14\right)^k\left(\frac34\right)^{40-k}}=\sqrt{\frac{15}2}\approx2.73861$$


Sorry, but do you know why I did that? Because $\rm P(X=k)$ is constant for both because of inter-independence of questions.I would further say that any strategy you device would all get you same results. I can't prove it, but you can call it my intution. I think like publishing this theory of independent results! :D