The questions on a survey asked:
- Do you actively participate in a study group?
- Do you think the class is going too quickly?
For both, the responses are one of the following: strongly agree, agree, neutral, disagree, strongly disagree
So I want to analyze if there is a correlation between those who are in a study group and those who think the class is going to quickly.
So I have two columns in a data frame. The first column is labeled "group" and the second is "fast". I have converted the group variable into a numerical variable, like so: strongly agree = 5, agree = 4, neutral = 3, disagree = 2, strongly disagree = 1
So now I have a data frame with two columns, one full of numbers and the other still full of the original answers ("strongly agree", "agree", etc).
I have found the means of every option quickly, now I just need to see if there is a statistical significance, but I am clueless. How exactly should I calculate the p-value on this? I have tried several methods but the p-value seems wrong.
Sorry if this is easy stuff, I think I have made this way more complicated in my mind than it should be and I appreciate any help.
quickly <- CSExperiencesAllWithHeaders$CEQuickly
groups <-CSExperiencesAllWithHeaders$CEStudyGroup
levels(groups) <- (c(levels(groups), 5, 4, 3, 2, 1))
groups[groups == "strongly agree"] <- 5
groups[groups == "agree"] <- 4
groups[groups == "neutral"] <- 3
groups[groups == "disagree"] <-2
groups[groups == "strongly disagree"] <- 1
groups[groups == ""] <- NA
groups[groups == "N/A"] <- NA
quickly[quickly == "N/A"] <- NA
quickly[quickly == ""] <- NA
groups <- factor(groups)
quickly <- factor(quickly)
analysis3 <- data.frame(groups,quickly)
analysis3 <- na.omit(analysis3)
analysis3$groups <- as.numeric(as.character(analysis3$groups))
sagree2 <- subset(analysis3, quickly == "strongly agree")
agree2 <- subset(analysis3, quickly == "agree")
neutral2 <- subset(analysis3, quickly == "neutral")
disagree2 <- subset(analysis3, quickly == "disagree")
sdisagree2 <- subset(analysis3, quickly == "strongly disagree")
meansagree2 <- mean(sagree2$groups)
meanagree2 <- mean(agree2$groups)
meanneutral2 <- mean(neutral2$groups)
meandisagree2 <- mean(disagree2$groups)
meansdisagree2 <- mean(sdisagree2$groups)
barplot(c(meansagree2, meanagree2, meanneutral2, meandisagree2,
meansdisagree2),
main = "Those Who Think Class is Too Quick: In Study Groups?",
names.arg=c("Strongly Agree","Agree","Neutral","Disagree",
"Strongly Disagree"),
xlab = "Class too Quick?",
ylab = "In a Study Group?")
all this code creates this data frame (I only took the top of the data frame since the real one is over 1000 columns):
groups quickly
1 5 'strongly disagree'
2 4 'strongly agree'
3 1 'disagree'
4 1 'disagree'
5 4 'strongly disagree'
6 2 'strongly disagree'
7 1 'neutral'
8 2 'disagree'
9 1 'strongly disagree'
10 2 'strongly disagree'
11 1 'strongly disagree'
12 2 'neutral'
13 5 'disagree'
14 2 'disagree'
15 4 'neutral'
16 2 'disagree'
17 5 'disagree'
18 5 'neutral'
19 4 'strongly disagree'
20 2 'strongly disagree'
21 3 'disagree'
22 1 'strongly disagree'
23 4 'strongly agree'
24 1 'strongly disagree'
26 5 'strongly disagree'
27 1 'strongly disagree'
28 5 'disagree'
29 5 'agree'
This is what I get when I use the dput function:
structure(list(groups = c(5, 4, 1, 1, 4, 2, 1, 2, 1, 2, 1, 2,
5, 2, 4, 2, 5, 5, 4, 2, 3, 1, 4, 1, 5, 1, 5, 5, 5, 5), quickly = structure(c(5L,
4L, 2L, 2L, 5L, 5L, 3L, 2L, 5L, 5L, 5L, 3L, 2L, 2L, 3L, 2L, 2L,
3L, 5L, 5L, 2L, 5L, 4L, 5L, 5L, 5L, 2L, 1L, 2L, 3L), .Label = c("agree",
"disagree", "neutral", "strongly agree", "strongly disagree"), class = "factor"),
qui_fact = structure(c(5L, 1L, 4L, 4L, 5L, 5L, 3L, 4L, 5L,
5L, 5L, 3L, 4L, 4L, 3L, 4L, 4L, 3L, 5L, 5L, 4L, 5L, 1L, 5L,
5L, 5L, 4L, 2L, 4L, 3L), .Label = c("strongly agree", "agree",
"neutral", "disagree", "strongly disagree"), class = "factor"),
qui_num = c(5, 1, 4, 4, 5, 5, 3, 4, 5, 5, 5, 3, 4, 4, 3,
4, 4, 3, 5, 5, 4, 5, 1, 5, 5, 5, 4, 2, 4, 3)), .Names = c("groups",
"quickly", "qui_fact", "qui_num"), na.action = structure(c(25L,
31L, 37L, 38L, 86L, 91L, 148L, 209L, 270L, 280L, 285L, 328L,
338L, 340L, 410L, 424L, 456L, 460L, 461L, 480L, 568L, 587L, 593L,
596L, 599L, 600L, 607L, 621L, 658L, 700L, 717L, 731L, 758L, 776L,
827L, 837L, 849L, 862L, 864L, 896L, 899L, 909L, 921L, 946L, 963L,
966L, 977L, 994L, 1007L, 1012L, 1074L, 1079L), .Names = c("25",
"31", "37", "38", "86", "91", "148", "209", "270", "280", "285",
"328", "338", "340", "410", "424", "456", "460", "461", "480",
"568", "587", "593", "596", "599", "600", "607", "621", "658",
"700", "717", "731", "758", "776", "827", "837", "849", "862",
"864", "896", "899", "909", "921", "946", "963", "966", "977",
"994", "1007", "1012", "1074", "1079"), class = "omit"), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 26L, 27L, 28L, 29L,
30L, 32L), class = "data.frame")
Best Answer
As you have ordinal factors, means are not so useful. You could use a $\chi^2$ test and/or Spearman correlation to find if the two values are correlated.
Commands:
chisq.test(analysis3$groups,analysis3$quickly)
,and after converting your "quickly" strings to factors, reordering and extracting the levels to a numeric vector, you can apply Spearman correlation:
analysis3$qui_fact<- as.factor(analysis3$quickly)
levels(analysis$qui_fact) #(alphabetical levels)
analysis$qui_fact<- factor(analysis$qui_fact,levels(analysis$qui_fact)[c(4,1,3,2,5)]) #reorder as needed
analysis$qui_num<- as.numeric(analysis$qui_fact)
cor.test(analysis$groups,analysis$qui_num,alt="two.sided",method="spearman",conf.level=.99)