Solved – T-test for Bernoulli Distribution- Sample or Population data for SE calculation

bernoulli-distributionhypothesis testingpopulationsamplet-test

Am struggling to understand part of the answer to a question have done-

Qu- In a given population, 11% of the likely voters are African American. A survey using a simple random sample of 600 landline telephone numbers finds 8% African Americans. Is there evidence that the survey is biased?

To answer the question I found it quite simple. Set up
H0: Survey is random
H1: Survey is biased
$ \hat p=0.08$ & $p=0.11 $

Calculated my t value using $
t=(\hat p – p)/SE(\hat p)$

where $SE(\hat p)= (\hat p(1− \hat p)/n)^{1/2} \\$

and got a t value of $t=2.72$ and rejected the null as the p value was less than 1%.
According to the answers my method is correct, however it is also stated:

An alternative formula for $SE(\hat p )$ is $0.11(1-0.11)/n$ which is valid under the null hypothesis that p=0.11)

I imagine that the lack of square root there is just a mistype, however am I correct in assuming that what they've done is calculate the standard error using the population data rather than the sample data? Is that acceptable, because obviously it would produce a different t value. I'm aware that in most questions this wouldn't be possible, but in bernouilli distributions it is.

Thanks

Best Answer

The idea of a hypothesis test is that you come up with a statistic whose distribution you know if the null hypothesis is true.

The most well-known case is the t-statistic, where you divide sample mean minus mean under the null by the square root of sample standard deviation divided by $n$. Some mathematical statistics then shows that this t-statistic follows a t-distribution with $n-1$ degrees of freedom if the null is true and we sample from a normal population.

Now, if the null is true, computing the standard error from $p=0.11$ is correct, because you are using the right $p$.

Then, we can write your test statistic as $$ t=\frac{\sqrt{n}(\hat p - p)}{\sqrt{\hat p(1− \hat p)}} $$ By the CLT, because $p=E(X_i)$ ($\hat p=1/n\sum_iX_i$) and assuming random sampling, $$\sqrt{n}(\hat p - p)\to_dN(0,Var(X_i))$$ But $Var(X_i)=p(1-p)$ for such a Bernoulli random variable, so that the test statistic converges in distribution to $$\sqrt{n}\frac{(\hat p - p)}{\sqrt{p(1-p)}}\to_dN(0,1),$$ i.e., it will behave like a standard normal r.v. in large samples, if the null is true.

Now, by the law of large numbers, $\hat p\to_pp$, it is also correct to use $\hat p$, as replacing the true $p(1-p)$ by a consistent estimator of this quantity does not alter the asymptotic distribution.

(This result is known, at least in econometrics, as Slutzky's theorem, which says that a product of two sequences, one of which converges in distribution and one of which converges in probability to a constant will converge in distribution to the product of the limits - "the weaker convergence mode dominates".)

Related Question