I have survey data, where participants where asked to choose two times between yes and no
For this example let's say (although I am note quite sure if this is a good example).
- Choice1: Do you want to take to be acknowledge by your coworkers?
- Choice2: Do you want to be responsible if something goes wrong?
So now we found considerable differences in the amount of yes and no to the two questions.
pacman::p_load(tidyverse)
# MRE Data
> df
Choice2
Choice1 No Yes
No 6 1
Yes 22 6
# dput
structure(c(6L, 22L, 1L, 6L), .Dim = c(2L, 2L), .Dimnames = list(
Choice1 = c("No", "Yes"), Choice2 = c("No", "Yes")), class = "table")
I am interested if these diferences are significant. So if e.g. substantially more
participants chose yes for Choice1 then for Choice2.
I thought I could analyze this with a $\chi^2$ Test?
However due to the small sample (and the small expected cell count) I got a warning, from chisq.test
, so instead I conducted a Fisher's exact test.
# ChiĀ² Test of Independence
chi <- chisq.test(df)
chi
# Exepected Cell Count
chi$expected
# Due to small expected cell count and Warning: Chi-squared approximation may be incorrect
# Instead conduct a fisher exact test
fisher.test(df)
Fisher's Exact Test for Count Data
data: df
p-value = 1
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.1437625 87.6035655
sample estimates:
odds ratio
1.615585
What strikes me about the result is a p value of 1.
Looking at the proportions for Choice 1 80% voted no, for Choice 2 20%.
This appears to be reasonable differences.
# Print proportions
df %>%
rbind("Prop" =(prop.table(df) %>% colSums() *100)) %>%
cbind("Prop" = c((prop.table(df) %>% rowSums() *100),100))
# Choice 1: No = 80%, Choice 2: No = 20% how is p = 1
No Yes Prop
No 6 1 20
Yes 22 6 80
Prop 80 20 100
No I am wondering if I am even using the right test? I know that the $\chi^2$ is a test of independence. So the H1 would be that Choice1 and Choice2 are dependent. However I am rather interested in knowing if the propotions between Choice1 and Choice2 are meaningful different.
And how does it come that I get a p = 1
Edit Created the differ Variable
> df
# A tibble: 35 x 3
Choice1 Choice2 differ
<fct> <fct> <dbl>
1 Yes No 1
2 Yes No 1
3 Yes No 1
4 No No 0
5 No No 0
6 Yes No 1
7 Yes Yes 0
8 Yes No 1
9 Yes Yes 0
10 Yes No 1
# ... with 25 more rows
> df %>% dput()
structure(list(Choice1 = structure(c(2L, 2L, 2L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L,
2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L), .Label = c("No",
"Yes"), class = "factor"), Choice2 = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L,
1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L
), .Label = c("No", "Yes"), class = "factor"), differ = c(1,
1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1,
0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -35L))
Solution Edit:
In addition to the provided answer, I would like to point attention to the very helpful question that @Scortchi linked in the comments (see here). The answer provided by Gung really improved my understanding and helped me navigate. The correct test for my question would be either the binominal test (as mentioned in the accepted answer) or the McNemmar $\chi^2$ test. Please refer to the link for more details on the reasoning behind.
Best Answer
You have set up your data to test independence. If you want to compare the proportions you need the rows labelled with the choices and the columns with the responses. However if this is the same people measured twice then only people whose responses differ are informative so you need just those two numbers and then do a binomial test.