Solved – Hypothesis Testing for survey

binomial distributionhypothesis testingr

Suppose my null hypothesis is “Not more than 50% person prefer milk” and alternate hypothesis is “More than 50% (or majority) prefer milk”. Actually the null and alternate hypothesis are set so only to test whether majority prefer milk or not.

Then I set the question in the questionnaire as:
"Do you prefer milk? Yes$\,\Box\:\:$ No$\,\Box\:\:$ Not sure$\,\Box\,$".

After a survey of 100 respondents I found that 40% ticked ‘Yes’, 57% ticked ‘No’, and the rest (3%( ticked ‘Not sure’.

To test the hypothesis, I ran a one-sided binomial test in R using the command:

binom.test(40, 100, p=0.5, alternative="greater")

Then, on the basis of the test result, rejecting the alternate hypothesis and accepting the null hypothesis I got “Not more than 50% person prefer milk”. But not that “Majority don’t prefer milk”.

So, to conclude that “Majority don’t prefer milk” from such ‘Yes/No/Not sure’ type question where response option is 3 types (not 2 types i.e. Yes/No), how the reasoning should be narrated after that test, or what statistical test should I perform & how – to conclude that “Majority don’t prefer milk”.

Best Answer

You don't "reject the alternate hypothesis". You either reject the null or fail to reject it.

It's perfectly possible that you can neither conclude "a majority prefer milk" nor "a majority do not prefer milk" (because you cannot distinguish milk preference from 50%)

If you set up your hypotheses as:

$H_0$: Not more than 50% of the population prefer milk
$H_1$: More than 50% prefer milk

Then rejection of the null would lead to the conclusion that more than 50% prefer milk.

One must be very careful of hypothesis tests (especially one sided ones). Failure to reject the null is not positive assertion that the proportion who prefer milk is $\leq$ 50%.

You should explicitly clarify whether your hypothesis includes the 'not sure' as 'don't prefer'. It's probably better still to explicitly state it in terms of answers to the survey question and then interpret it in the conclusion. Consider this phrasing:

Let $p_Y$ be the population proportion who would answer 'Yes' in response to 'Do you prefer milk?'

$H_0$: $p_Y\leq \frac{1}{2}$
$H_1$: $p_Y> \frac{1}{2}$

This phrasing avoids the ambiguity in your question. What majority is being considered is then interpreted in the text, either before the test or in the conclusion -- or both.

Alternatively if you set up your hypotheses as:

Let $p_N$ be the population proportion who would answer 'No' in response to 'Do you prefer milk?'

$H_0$: $p_N\leq \frac{1}{2}$
$H_1$: $p_N> \frac{1}{2}$

Then rejection of the null would lead to the conclusion that more than 50% do not prefer milk.

If, instead, you want to investigate the sum of the two 'not-yes' categories, you'd say:

Let $p$ be the population proportion who would answer either 'No' or 'Not sure' in response to 'Do you prefer milk?'

$H_0$: $p\leq \frac{1}{2}$
$H_1$: $p> \frac{1}{2}$

Then rejection of the null would lead to the conclusion that more than 50% would not say they prefer milk. [If that was the test you were trying to carry out, I suggest you organize it as an explicitly as that so that the ambiguities inherent in ordinary language are avoided.]

A warning

One sided hypotheses would normally be based on a clear a priori reason to think that the alternative would not occur, or at the very least, is of no interest. In general, if you don't have that you'd use a two-sided test.

If you don't have good prior justification for a one-sided test, you'll likely be seen as significance hunting.

Related Solutions

Solved – Statistical test for survey question with 3 response option ‘Yes’ ‘No’ ‘Not sure’

You map the responses to your definition of 'prefer':

    Response:             Prefers new product?
     Yes                   yes
     Don't know            no
     No                    no

Keep in mind that a majority is >50%. Obviously you're not interested in whether a majority of your sample prefers it (you don't need a statistical test for that). You want to see if your sample result is more than could be explained by random variation about a population quantity that wasn't a majority.

The next thing to worry about is whether you're doing a one-sided or two-sided test. From what you say in your question I assume you want one-sided - that is, to test the null that no more than 50% of the population prefer the new product against the alternative that a majority prefer it. It may be that instead you want to pick up a majority in either direction, which would be a two-sided alternative.

So for the one-sided test you have

$H_0: \text{no more than 50% prefer the new product}\qquad$ versus $H_1: \text{a majority prefer the new product}\qquad$

And you compare your sample proportion (of 80 out of 100) with the highest population proportion under the null.

You'd use a binomial test for this, though with $n=100$, you'd be able to use the normal (z-test) approximation.

Solved – Null and alternative hypothesis in a test using the hypergeometric distribution

I'm working on a similar problematic and as some of the link provided by @whuber are missing here how I would approach your problem. I'm curious to see if my understanding of hypergeometric testing is sound and right, so I welcome any constructive feedback. My answers to the numbered questions above follow.

Q1 : I agree with @whuber

Q2 : It seems to me that it's the wrong H0 for your problem statement. If you use H0 = 50%, it means that you might reject it, whether the actual proportion of vote for candidate A is higher or lower than 50%. Assuming H0 <= 50% as a one sided test seems more appropriate.

Q3 : Having restated your H0 you could now use one sided confidence interval calculation as described in the following paper to get confidence interval on your estimated proportion (M): http://www.wright.edu/~weizhen.wang/paper/37-2015jasa_wang.pdf

Q4 : Will follow from the previous calculation. Your upper bound on the proportion of vote for candidate A will be your critical value. If 540 is outside the interval it means that you have an higher than chance probability that candidat A is actually ahead in the poll. And so you would reject the null hypothesis.

Q5 : Not necessarily, you can test your hypothesis and compute the power level associated. But you can also set an a priori power level that you need to be able to detect a specific effect size : http://www.real-statistics.com/sampling-distributions/statistical-power-sample/

Thank you

Best Answer

A warning

Related Solutions

Solved – Statistical test for survey question with 3 response option ‘Yes’ ‘No’ ‘Not sure’

Solved – Null and alternative hypothesis in a test using the hypergeometric distribution

Related Question