Solved – Parametric tests and Likert Scales (Ordinal data) – Two different views

assumptionslikertordinal-data

Following articles reach quite different conclusion and I start to believe that there is no clear answer to this problem. The conclusions are below and the first author reacts on second author.
My question here is, what approach is appropriate (given following situation) when we want to analyse Likert Scales in Social research, MANOVA fits our research design (two or more DV based on Likert Scale), we have N = 180 but these two contradictory opinions?

First Article:

Parametric statistics can be used with Likert data, with small sample
sizes, with unequal variances, and with non-normal distributions, with
no fear of ‘‘coming to the wrong conclusion’’. These findings are
consistent with empirical literature dating back nearly 80 years. The
controversy can cease (but likely won’t).

  • Norman, Geoff. "Likert scales, levels of measurement and the “laws” of statistics." Advances in health sciences education 15.5 (2010): 625-632.
    10.1007/s10459-010-9222-y

Second article:

(…)the researcher should decide what level of measurement is in use
(to paraphrase, if it is an interval level, for a score of 3, one
should be able to answer the question "3 what?"); non-parametric tests
should be employed if the data is clearly ordinal, and if the
researcher is confident that the data can justifiably be classed as
interval, attention should nevertheless be paid to the sample size and
to whether the distribution is normal.

  • Jamieson, S. (2004). Likert scales: how to (ab)use them. Medical education, 38(12), 1217-1218.
    DOI: 10.1111/j.1365-2929.2004.02012.x

Best Answer

One way I approach this is to not take people's word for it, based on what appears to be either their beliefs, or precedent, but to try it out and see if (in your case) it matters in a way that you care about.

Here's a simple example: A 5 point Likert scale, with a uniform distribution. 100 people per group, and we'll do a two sample t-test. I'll repeat this 10000 times when the null hypothesis is true (i.e. there is no difference).

> mean(sapply(1:1000, function(x) { 
    t.test(sample(1:5, 100, TRUE), sample(1:5, 100, TRUE))$p.value 
  } ) < 0.05)

[1] 0.0499

It appears that I get a significant value 4.99% of the time. Given that I expect a significant value 5% of the time, it does not appear that violating the assumptions of normality and interval measurement has had any effect on my results - at least in terms of type I errors. (There might be power issues, of course.)

If someone has a specific criticism, you can investigate and see if it's an issue.

Here's another example: Now I have 5 people in one group, and 100 in the other.

>   mean(sapply(1:10000, function(x) { t.test(sample(1:5, 5, TRUE), sample(1:5, 100, TRUE))$p.value } ) < 0.05)
[1] 0.0733

Now I have a 7.3% type I error rate. This is probably enough to worry about.

What about 5 per group?

 mean(sapply(1:10000, function(x) { t.test(sample(1:5, 5, TRUE), sample(1:5, 5, TRUE))$p.value } ) < 0.05)

Now a 4.5% signifance rate - indicates a slight loss of power, but I prefer that (a lot) over an inflated type I error rate.