Solved – Hypothesis Testing for survey

binomial distributionhypothesis testingr

Suppose my null hypothesis is “Not more than 50% person prefer milk” and alternate hypothesis is “More than 50% (or majority) prefer milk”. Actually the null and alternate hypothesis are set so only to test whether majority prefer milk or not.

Then I set the question in the questionnaire as:
"Do you prefer milk? Yes$\,\Box\:\:$ No$\,\Box\:\:$ Not sure$\,\Box\,$".

After a survey of 100 respondents I found that 40% ticked ‘Yes’, 57% ticked ‘No’, and the rest (3%( ticked ‘Not sure’.

To test the hypothesis, I ran a one-sided binomial test in R using the command:

binom.test(40, 100, p=0.5, alternative="greater")

Then, on the basis of the test result, rejecting the alternate hypothesis and accepting the null hypothesis I got “Not more than 50% person prefer milk”. But not that “Majority don’t prefer milk”.

So, to conclude that “Majority don’t prefer milk” from such ‘Yes/No/Not sure’ type question where response option is 3 types (not 2 types i.e. Yes/No), how the reasoning should be narrated after that test, or what statistical test should I perform & how – to conclude that “Majority don’t prefer milk”.

Best Answer

You don't "reject the alternate hypothesis". You either reject the null or fail to reject it.

It's perfectly possible that you can neither conclude "a majority prefer milk" nor "a majority do not prefer milk" (because you cannot distinguish milk preference from 50%)

If you set up your hypotheses as:

$H_0$: Not more than 50% of the population prefer milk
$H_1$: More than 50% prefer milk

Then rejection of the null would lead to the conclusion that more than 50% prefer milk.

One must be very careful of hypothesis tests (especially one sided ones). Failure to reject the null is not positive assertion that the proportion who prefer milk is $\leq$ 50%.

--

You should explicitly clarify whether your hypothesis includes the 'not sure' as 'don't prefer'. It's probably better still to explicitly state it in terms of answers to the survey question and then interpret it in the conclusion. Consider this phrasing:

Let $p_Y$ be the population proportion who would answer 'Yes' in response to 'Do you prefer milk?'

$H_0$: $p_Y\leq \frac{1}{2}$
$H_1$: $p_Y> \frac{1}{2}$

This phrasing avoids the ambiguity in your question. What majority is being considered is then interpreted in the text, either before the test or in the conclusion -- or both.

--

Alternatively if you set up your hypotheses as:

Let $p_N$ be the population proportion who would answer 'No' in response to 'Do you prefer milk?'

$H_0$: $p_N\leq \frac{1}{2}$
$H_1$: $p_N> \frac{1}{2}$

Then rejection of the null would lead to the conclusion that more than 50% do not prefer milk.

--

If, instead, you want to investigate the sum of the two 'not-yes' categories, you'd say:

Let $p$ be the population proportion who would answer either 'No' or 'Not sure' in response to 'Do you prefer milk?'

$H_0$: $p\leq \frac{1}{2}$
$H_1$: $p> \frac{1}{2}$

Then rejection of the null would lead to the conclusion that more than 50% would not say they prefer milk. [If that was the test you were trying to carry out, I suggest you organize it as an explicitly as that so that the ambiguities inherent in ordinary language are avoided.]


A warning

One sided hypotheses would normally be based on a clear a priori reason to think that the alternative would not occur, or at the very least, is of no interest. In general, if you don't have that you'd use a two-sided test.

If you don't have good prior justification for a one-sided test, you'll likely be seen as significance hunting.

Related Question