[Math] Maximizing the probability of a poll prediction

central limit theoremlaw-of-large-numbersnormal distributionprobabilitystatistics

Using the central limit theorem, I was able to find out the first part of this question. However, part b is eluding me. How do I, in general, find a value for $n$ such that we can ensure the probability a poll is correctly predictive (ie. 95% certainty a majority would vote for candidate A?)? I'm sure it has something to do with CTL or LLN, but it's been elusive. The equations at the bottom were my best guess as to finding some relationship between the probability and $n$, but I cannot solve the equation.

Situation: In a large voting population, 56% of the voters prefer candidate A to candidate B. The true percentage is not known to either candidate, and candidate A commissions a poll of 100 voters to determine whether or not he will win.

a) Using the Central Limit Theorem, determine the probability that the poll will correctly predict his winning the election. \

answer. Let $X_i$, $i = 1,…,100$ represent the outcome of the vote of the $i$-th person polled, where $X_i = 1$ if they prefer candidate A (with probability 0.56) and $X_i = 0$ otherwise. Let $S_n$ be the sum of all $X_i$. Then the probability that the poll will correctly predict candidate A winning the election is
\begin{align*}
P(S_n > 50) &= 1 – P(S_n \leq 50)
\end{align*}
It can be shown that $E[S_n] = n\mu = 56$ and $SD(S_n) = \sigma\sqrt{n} = 7.48$. Using the central limit theorem, we can approximate $Z = \frac{S_n – n\mu}{\sigma\sqrt{n}}$ as the Standard Normal, so
\begin{align*}
P(S_n \leq 50) &= P(Z \leq \frac{50 – n\mu}{\sigma\sqrt{n}}) \\
&= P(Z \leq \frac{50 – 56}{7.48}) \\
&= \Phi(-0.8021) \\
&= 0.2119
\end{align*}
Finally, the probability that the poll will correctly predict candidate A winning the election is $P(S_n > 50) = 1 – P(S_n \leq 50) = 0.7881$.

b) What sample size would be required to ensure at least a 95% chance of a correct prediction?

answer. From part a, we can use the following inequality to solve for $n$:
\begin{align*}
0.95 &\leq 1 – P(S_n \leq 50) \\
0.05 &\geq P(S_n \leq 50) \\
&\geq \Phi(\frac{50 – n(0.56)}{(0.784)\sqrt{n}})
\end{align*}

Note: I also attempted to use Markov's inequality to try to come up with a solution, but that proved just as difficult.

EDIT: Thanks commenter Andre for pointing out that the last equation should use $P(S_n \leq n/2)$ instead of using 50. The equation is then
$$
0.95 \leq 1 – P(S_n \leq n/2)
$$

Best Answer

We use essentially your notation. To determine the sample size, note that we want $$\Pr(S_n\lt 0.5n)\approx 0.05.$$ We will need to find the standard deviation of $S_n$. The variance is $(0.56)(0.44)n$, so the standard deviation is about $0.4964\sqrt{n}$. (Please note that in the case $n=100$ the wrong standard deviation was used.)

We have $$\Pr(S_n\lt 0.5n)\approx \Phi\left(\frac{0.5n-0.56n}{0.4964\sqrt{n}}\right).$$ And $\Phi(a)=0.05$, from tables, occurs when $a\approx -1.645$.

So, simplifying a little, we find that $$\frac{0.06\sqrt{n}}{0.4964}\approx 1.645.$$ Now we can find the appropriate $n$.

Related Question