Proportion – Calculating Standard Deviation of Sampling Proportion Given Population Proportion

proportion;sample

So the question is as follows (paraphrased):

A population votes for/against a political candidate. In a sample of 200 voters, let $X$ be the fraction of people who vote for this candidate.

Population size is unknown

Population proportion who vote for is known to be $0.65$

What is the standard deviation of the samples proportions, i.e. the standard deviation of $X$?

My approach:

$X \sim B(200,p)$

$E(X)=0.65=np$

$\rightarrow p=0.65/200= 0.00325$

$\sigma_X=\sqrt{200\cdot 0.00325\cdot0.99675}=0.8049$

Then I read somewhere that the standard deviation of a sampling proportions is $\sqrt{\displaystyle\frac{pq}{n}}$, which isn't the same as the one in my approach. Is this because $\sqrt{\displaystyle\frac{pq}{n}}$ is used for estimating the true population proportion when it's unknown (which isn't the case for my problem)?

EDIT: I've edited the question so the variables are clearer.

Best Answer

I think the end of this story is to use the number $X$ of people in favor of the candidate out of $n = 200$ people interviewed in order to get a 95% confidence interval for $p,$ the proportion of people in the population who favor the candidate. I agree with @a.statistician (+1) that you are confusing $SD(X)$ and $SD(p),$ but it seems to me you are also confusing $p$ and $\hat p,$ so there is a little more to this before you are ready to do statistical analysis.

Let $X \sim \mathsf{Binom}(n = 200, p).$ Then $E(X) = np = 200p,\, Var(X) = npq = 200pq,\,$ and $SD(X) = \sqrt{npq} = \sqrt{200pq},$ where $q = 1-p.$

Then suppose you take a poll and you observe $X = 130$ people in favor of the candidate out of $n = 200$ people interviewed. Then the estimate $\hat p$ of the population $p$ in favor of the candidate is $\hat p = \frac{X}{n} = \frac{130}{200} = 0.65.$ Then $\hat p$ is also a random variable, based on a binomial distribution, but not itself binomially distributed. [Notice that this addresses an error pointed out by @jbowman, that you never corrected in your Question.]

We can find $E(\hat p) = E(X/n) = \frac 1nE(X) = \frac 1n np = p,$ $V(\hat p) = V(X/n) = \frac{1}{n^2}V(X) = \frac{1}{n^2}npq = \frac{pq}{n}.$ Then finally, $SD(\hat p) = \sqrt{\frac{pq}{n}}.$

Then if you want a 95% confidence interval for $p$ based on $\hat p,$ you can assume (for sufficiently large $n$) that $\hat p$ is normally distributed. Upon standardization, this leads to saying that $Z = \frac{\hat p - p}{SD(\hat p)}$ is approximately standard normal. Thus $$P\left(-1.96 < \frac{\hat p - p}{SD(\hat p)} < -1.96 \right) \approx 0.95,$$ from which we get that a 95% CI for $p$ would be something like $\hat p \pm 1.96\sqrt{\frac{pq}{n}}.$ But since we don't know the values of $p$ and $q,$ we need to estimate them. Thus we obtain the approximate 95% confidence interval of the form $$\hat p \pm 1.96\sqrt{\frac{\hat p\hat q}{n}},$$ where $\hat q = 1 - \hat p.$ For $n = 300$ and $X = 130,$ the 95% CI computes to $(0.377, 0.489).$

Note: Because this CI has two approximations (a) normal approximation to binomial and (b) using $\sqrt{\frac{\hat p\hat q}{n}}$ for $\sqrt{\frac{pq}{n}},$ it has been shown not to be very useful for smaller values of $n$. Agresti and Coull have shown that the following adjustment improves accuracy of a 95% CI for $p:$ Use $\check p = \frac{X+2}{n+4},\, \check q = 1 - \check p,$ and $\check n = n+4,$ so that the adjusted CI becomes

$$\check p \pm 1.96\sqrt{\frac{\check p\check q}{\check n}}.$$

For $n = 300, X = 130,$ the adjusted 95% CI computes to $(0.378, 0.490).$ For $n$ as large as $300,$ the adjustment is relatively minor.

Related Question