Solved – How to test for differences between medians of multiple Likert items

likertmedianordinal-data

In a questionnaire study, we asked respondents to express their attitudes towards how different winter climate factors such as snow, slipperiness might affect their choice to walk and cycle to work. The sample composed of 500 individuals and answers were in form of 5 scales rating form very negative to very positive (ordinal scale).

If I want to compare the responses to different questions, I guess median is a proper tool since the data is ordinal. I know that to compare means there are different statistical tests to show if the probability of difference is significant (t-test or non- parametric test..). But I am a bit confused if I can use these test on the type of data I explained here.

  • Is there a test statistics to use for comparing medians?
  • Or I should transfer data to interval scale if it is appropriate?

Best Answer

I find the mean to be a much more useful indicator of central tendency of Likert items than the median. I have elaborated on my argument here on a question asking about whether to use the mean or median for likert items.

A recap of some of these reasons:

  • The mean is more informative; the median is too gross for Likert items. For example, the median of 1 1 3 3 3 is the same as 3 3 3 5 5 (i.e., 3) but the mean reflects the difference.
  • Likert items are often phrased in ways where the equal distance between categories assumption is a useful starting point.
  • Even if individual responses are discrete, the group level measurement approaches continuity (with 500 people and a 5 point scale, the value of your mean could take on 500 * 4 + 1 = 2001 different values)
  • There is little argument that a percentage is a useful summary in yes-no type questions (e.g., voting). This is just the mean where responses have been coded 0 and 1. Treating a 5 point likert scale as 1 2 3 4 5 seems almost as natural to me.
  • Other plausible scalings of the Likert items probably wont change inferences substantively regarding whether differences between means exist (but you can check this).

If you are persuaded that the mean is the appropriate measure of central tendency, then you would want to structure your hypothesis tests so that they test for differences between means. A paired sample t-test would allow for a pair-wise comparison of means, but there would be issues around the accuracy of p-values given the discrete and non-normal error distribution. Nonetheless, adopting a non-parametric approach is not a solution, because it changes the hypothesis.

I would expect that the paired sample t-test would be fairly robust at least for typicaly Likert item means that avoid either extreme on the scale, but I don't have any simulation studies on hand.