Solved – How to rank the results of questions with categorical answers

categorical datascalessurvey

I'm working with the results of a survey which has multiple questions. All answers (in this case) are categorical and ordinal (such as very unhappy, unhappy, neutral, happy, very happy).

I'm looking for a way to sort the questions from those with "worst results" to those with "best results". Getting the extremes is somewhat easy visually. If I plot the distribution of answers for each question, I can identify which questions have lots of 'good' answers (distribution is negatively skewed) or those with lots of 'bad' answers (positively skewed histogram). So picking the extremes is easy but this is also dependent on the data.

Quantitatively however, I don't know what to do. Since the answers are on an ordinal, but not an interval scale, I don't know how to calculate an aggregate number for each question. Perhaps giving a numerical value to each category (such as -2, -1, 0, 1 or 2) and summing up the results might work if there's nothing better, but I do realize that mathematically this is not accurate as this is not an interval scale.

Oh, I'm not a statistician, just a programmer. I hope there is a reasonable option to this, I can imagine it's a fairly common question with categorical data.

Thanks in advance.

PS I use R in case there is something built-in.

Best Answer

If all your questions have the same response scale and they are standard Likert items, scaling the item 1,2,3,4,5 and taking the mean is generally fine.

You can investigate the robustness of the rank ordering by experimenting with different scaling procedures (e.g., 0, 0, 0, 1, 1 is common where you want to assess the percentage happy or very happy; or agreeing or strongly agreeing). From my experience, such variants in scaling will give you almost identical question orderings. You could also explore optimal scaling principal components or some form of polytomous IRT approach is you wanted to be sophisticated.

A table with three columns would be fine: rank, item text, mean. You could also do the same thing with question on the x axis and mean on the y axis.