Solved – Correlation between two variables measured on a “strongly agree” to “strongly disagree” scale

ordinal-datap-valuersurvey

The questions on a survey asked:

  • Do you actively participate in a study group?
  • Do you think the class is going too quickly?

For both, the responses are one of the following: strongly agree, agree, neutral, disagree, strongly disagree

So I want to analyze if there is a correlation between those who are in a study group and those who think the class is going to quickly.

So I have two columns in a data frame. The first column is labeled "group" and the second is "fast". I have converted the group variable into a numerical variable, like so: strongly agree = 5, agree = 4, neutral = 3, disagree = 2, strongly disagree = 1

So now I have a data frame with two columns, one full of numbers and the other still full of the original answers ("strongly agree", "agree", etc).

I have found the means of every option quickly, now I just need to see if there is a statistical significance, but I am clueless. How exactly should I calculate the p-value on this? I have tried several methods but the p-value seems wrong.

Sorry if this is easy stuff, I think I have made this way more complicated in my mind than it should be and I appreciate any help.

quickly <- CSExperiencesAllWithHeaders$CEQuickly
    groups <-CSExperiencesAllWithHeaders$CEStudyGroup

levels(groups) <- (c(levels(groups), 5, 4, 3, 2, 1))
groups[groups == "strongly agree"] <- 5
groups[groups == "agree"] <- 4
groups[groups == "neutral"] <- 3
groups[groups == "disagree"] <-2
groups[groups == "strongly disagree"] <- 1
groups[groups == ""] <- NA
groups[groups == "N/A"] <- NA
quickly[quickly == "N/A"] <- NA
quickly[quickly == ""] <- NA

groups <- factor(groups)
quickly <- factor(quickly)
analysis3 <- data.frame(groups,quickly)
analysis3 <- na.omit(analysis3)
analysis3$groups <- as.numeric(as.character(analysis3$groups))

sagree2 <- subset(analysis3, quickly  == "strongly agree")
agree2 <- subset(analysis3, quickly == "agree")
neutral2 <- subset(analysis3, quickly == "neutral")
disagree2 <- subset(analysis3, quickly == "disagree")
sdisagree2 <- subset(analysis3, quickly == "strongly disagree")

meansagree2 <- mean(sagree2$groups)
    meanagree2 <- mean(agree2$groups)
meanneutral2 <- mean(neutral2$groups)
    meandisagree2 <- mean(disagree2$groups)
meansdisagree2 <- mean(sdisagree2$groups)

barplot(c(meansagree2, meanagree2, meanneutral2, meandisagree2, 
          meansdisagree2),
        main = "Those Who Think Class is Too Quick: In Study Groups?",
        names.arg=c("Strongly Agree","Agree","Neutral","Disagree", 
                    "Strongly Disagree"),
        xlab = "Class too Quick?",
        ylab = "In a Study Group?")

all this code creates this data frame (I only took the top of the data frame since the real one is over 1000 columns):

    groups  quickly
1   5   'strongly disagree'
2   4   'strongly agree'
3   1   'disagree'
4   1   'disagree'
5   4   'strongly disagree'
6   2   'strongly disagree'
7   1   'neutral'
8   2   'disagree'
9   1   'strongly disagree'
10  2   'strongly disagree'
11  1   'strongly disagree'
12  2   'neutral'
13  5   'disagree'
14  2   'disagree'
15  4   'neutral'
16  2   'disagree'
17  5   'disagree'
18  5   'neutral'
19  4   'strongly disagree'
20  2   'strongly disagree'
21  3   'disagree'
22  1   'strongly disagree'
23  4   'strongly agree'
24  1   'strongly disagree'
26  5   'strongly disagree'
27  1   'strongly disagree'
28  5   'disagree'
29  5   'agree'

This is what I get when I use the dput function:

structure(list(groups = c(5, 4, 1, 1, 4, 2, 1, 2, 1, 2, 1, 2,
5, 2, 4, 2, 5, 5, 4, 2, 3, 1, 4, 1, 5, 1, 5, 5, 5, 5), quickly = structure(c(5L,
4L, 2L, 2L, 5L, 5L, 3L, 2L, 5L, 5L, 5L, 3L, 2L, 2L, 3L, 2L, 2L,
3L, 5L, 5L, 2L, 5L, 4L, 5L, 5L, 5L, 2L, 1L, 2L, 3L), .Label = c("agree",
"disagree", "neutral", "strongly agree", "strongly disagree"), class = "factor"),
qui_fact = structure(c(5L, 1L, 4L, 4L, 5L, 5L, 3L, 4L, 5L,
5L, 5L, 3L, 4L, 4L, 3L, 4L, 4L, 3L, 5L, 5L, 4L, 5L, 1L, 5L,
5L, 5L, 4L, 2L, 4L, 3L), .Label = c("strongly agree", "agree",
"neutral", "disagree", "strongly disagree"), class = "factor"),
qui_num = c(5, 1, 4, 4, 5, 5, 3, 4, 5, 5, 5, 3, 4, 4, 3,
4, 4, 3, 5, 5, 4, 5, 1, 5, 5, 5, 4, 2, 4, 3)), .Names = c("groups",
"quickly", "qui_fact", "qui_num"), na.action = structure(c(25L,
31L, 37L, 38L, 86L, 91L, 148L, 209L, 270L, 280L, 285L, 328L,
338L, 340L, 410L, 424L, 456L, 460L, 461L, 480L, 568L, 587L, 593L,
596L, 599L, 600L, 607L, 621L, 658L, 700L, 717L, 731L, 758L, 776L,
827L, 837L, 849L, 862L, 864L, 896L, 899L, 909L, 921L, 946L, 963L,
966L, 977L, 994L, 1007L, 1012L, 1074L, 1079L), .Names = c("25",
"31", "37", "38", "86", "91", "148", "209", "270", "280", "285",
"328", "338", "340", "410", "424", "456", "460", "461", "480",
"568", "587", "593", "596", "599", "600", "607", "621", "658",
"700", "717", "731", "758", "776", "827", "837", "849", "862",
"864", "896", "899", "909", "921", "946", "963", "966", "977",
"994", "1007", "1012", "1074", "1079"), class = "omit"), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 26L, 27L, 28L, 29L,
30L, 32L), class = "data.frame")

Best Answer

As you have ordinal factors, means are not so useful. You could use a $\chi^2$ test and/or Spearman correlation to find if the two values are correlated.

Commands:

chisq.test(analysis3$groups,analysis3$quickly) ,

and after converting your "quickly" strings to factors, reordering and extracting the levels to a numeric vector, you can apply Spearman correlation:

analysis3$qui_fact<- as.factor(analysis3$quickly)

levels(analysis$qui_fact) #(alphabetical levels)

analysis$qui_fact<- factor(analysis$qui_fact,levels(analysis$qui_fact)[c(4,1,3,2,5)]) #reorder as needed

analysis$qui_num<- as.numeric(analysis$qui_fact)

cor.test(analysis$groups,analysis$qui_num,alt="two.sided",method="spearman",conf.level=.99)

Related Question