In the case of sample proportions, why do we not get a $t$-distribution when we estimate the standard deviation $\sigma_{\hat{p}}$

If $\bar{x}$ has a normal distribution (or approx normal via CLT), then:

$z=\frac{\bar{x}-\mu}{\frac{\sigma}{\sqrt{n}}}$ (has a z-distribution)

If we substitute the sample standard deviation $s$ for the population standard deviation $\sigma$ we get a $t$-distribution with n-1 degree's of freedom:

$t=\frac{\bar{x}-\mu}{\frac{\sigma}{\sqrt{n}}}$ (has a z-distribution)

Now, consider the sample proportion random variable $\hat{p}$. Then we have that:

$\mu_{\hat{p}}=p$ where p is the actual population proportion

$\sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}}$

If $\hat{p}$ has a normal distribution, then:

$z = \frac{\hat{p}-\mu_{\hat{p}}}{\sigma_{\hat{p}}}$ has a z-distribution.

Now, in the former case we estimated the population standard deviation $\sigma$ by using the sample standard deviation $s$; doing this resulted in going from a $z$-distribution to a $t$-distribution. In the current case, if we don't know the population proportion $p$, we can estimate $p$ (and thus estimate $\sigma_{\hat{p}}$) by $\hat{p}$.

Thus, based on what happens in the former case, one might suspect that the random variable:

$\frac{\hat{p}-\mu_{\hat{p}}}{\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}}$

has a t-distribution.

However, this is not the case, and I'd like to know why.

Why, in the first case, when we estimate the population standard deviation we get a $t$-distribution, but in the second case, we get a random variable that converges to a $z$-distribution (without ever having a chance to be a $t$-distribution)

Does this difference have to do that in the former case we have that the numerator and denominator are independent, whilst in the later case they are not?

Best Answer

Here's a fact:

A random variable T has a $t$-distribution if $T = \frac{Z}{\sqrt{V/\nu}}$, where $Z$ is standard normal, and $V$ is chi-square distributed with $\nu$ degrees of freedom.

Now note that $Z = \hat{p} - p$ is not normal. Thus, the ratio cannot be $t$-distributed. Furthermore, in large samples, the $t$-distribution is arbitrary close to the normal distribution.

Best Answer

Related Solutions

[Math] Why does the central limit theorem imply that the standard deviation approaches $\frac{\sigma}{\sqrt{n}}$

[Math] Standard deviation of the mean of sample data

Related Question