Hypothesis Testing – How to Analyze Likert Scale Data for Accurate Insights

hypothesis testinglikert

I'm planning on conducting a visual perception study in which 10 participants are each presented with 40 stimuli images; each stimulus has two target areas and the question is what target area is perceived to be brighter. The participants will indicate their choices on a 5 point Likert-scale ranging from "the left target is definitely brighter" to "the right target is definitely brighter". The participants are not divided into groups.

The question that I want to answer is whether most participants agree that for a certain stimulus one target is brighter than the other, i.e I want to analyse their answers for variability. What kind of statistical method should I use for this?

Best Answer

For each image, I assume that one spot (left or right) truly is the brighter one, and that you know which spot that is. Then for each image a person sees, you can get a score -2, -1, 0, 1, or, 2, where 2 is a strong correct response, -1 is a weak incorrect response, etc.

Summarizing the 40 responses for each of ten participants you will have ten median responses between -5 and +5. Specifically, one participant x may have 40 responses as below, with median $1.$

x
 [1]  2  0 -1  3  2  1 -3  3  2 -3  2 -1  2  1  2  2  1  3  1  2
[21] -3  2  3  0  1 -2  2  1  1  2  1  1  3 -1  1  1  2  1  0  1
sort(x)
 [1] -3 -3 -3 -2 -1 -1 -1  0  0  0  1  1  1  1  1  1  1  1  1  1
[21]  1  1  1  2  2  2  2  2  2  2  2  2  2  2  2  3  3  3  3  3
median(x)
[1] 1

Then your ten subjects might have median scores s as follows.

s
[1]  2 -3  3  1  1  0  2  1  3  3

table(s)
s
-3  0  1  2  3 
 1  1  3  2  3 

Among the ten scores s in my fictitious data, we have one participant with median below 0 and eight above 0. Thus, a sign test on the nine relevant (non-0) scores has P-value $P(S \ge 8 | H_0) = 0.0195 < 0.05 = 5\%,$ significant at the 5% level. The null hypothesis is that participants can't distinguish dark and light areas meaningfully, and have about a 50-50 chance of getting a positive median score.

1 - pbinom(7, 9, .5)
[1] 0.01953125
sum(dbinom((8:9), 9, .5))
[1] 0.01953125

Notes: (1) A sign test based on ten participants does not have good power, so even if some participants are good at identifying dark vs. light regions, you may not detect this effect with only ten participants.

(2) It would probably not be possible to use a one-sample nonparametric Wilcoxon signed-rank test, because, for small number of participants, this test does not work well when there are ties, and we are very likely to have tied medians in this experiment.

wilcox.test(s, mu=0)

        Wilcoxon signed rank test 
        with continuity correction

data:  s
V = 37.5, p-value = 0.08171
alternative hypothesis: 
 true location is not equal to 0

Warning messages:
1: In wilcox.test.default(s, mu = 0) :
  cannot compute exact p-value with ties
2: In wilcox.test.default(s, mu = 0) :
  cannot compute exact p-value with zeroes

(3) By contrast, one might look at mean responses for each of the ten subjects and hope they are nearly enough normal to do a one-sample t test. But this would require interpreting categorical scores as numerical values. There is controversy about this interpretation.

(4) Perhaps a better experimental design would be to have more than ten participants and (if necessary) fewer than 40 images per subject (e.g., maybe 20 participants, each looking at 20 images).

Related Question