Solved – 4 point and 5 point Likert scales in a new questionnaire


I'm looking to confirm whether having a 4 point Likert Scale for some constructs, and a 5 point Likert scale for other constructs will be suitable when developing a questionnaire?

I have 7 constructs with a 4-point scale, then 2 constructs with a 5 point scale and one construct with a different 4-point scale

Will this affect validation process when testing for construct validity? And which analysis would be best – e.g. polychoric correlations?

Thank you.

Best Answer

There should be other considerations on your Likert scales than just the number of categories.

Do you offer a neutral category? Compare: "Strongly disagree - Disagree - Agree - Strongly Agree" vs. "Strongly disagree - Disagree - No opinion - Agree - Strongly Agree". The first scale has a by-product of forced response, which may or may not be appropriate.

Do you label them with numbers? If you do, is the neutral category a zero? Compare: "1 Don't like it at all / 2 / 3 / 4 / 5 Like it a lot" vs "-2 Don't like it at all / -1 / 0 / 1 / 2 Like it a lot". The latter one does have the swing from negative to positive attitude, while the former one does not.

If you provide text with the categories, are they equidistant? Netflix 5-star scale sucks, in my opinion: it has only 1 star for "Don't like"s, and between 3 and 5 stars for various degree of "Like"s. Our department teaching evaluations were like that, too: 1 for "Poor", 2 for "Adequate", 3 for "Good", 4 for "Excellent", 5 for "Outstanding", and you basically had to score 4 and above. That's where most of the inconsistencies in validation will likely come from, as the distance between 4 and 5 is not nearly the same as between 1 and 2.

Update: As far as reliability and validity are concerned, I am not sure as to what the standard practices are regarding Likert scales. You can probably present the analysis of both the moment covariance and polychoric correlation matrices, to demonstrate the factor structure, reliability of individual items, and composite reliability of the factor. The moment-based analysis will understate the reliability of the underlying continuous scales, as much work has shown; the polychoric correlations will get these continuous scales right, but that's not what you are measuring. So the true reliability of your measurement process is somewhere in between. You can also demonstrate discriminant validity with these internal measurements (different factors correspond to different concepts, and hence their correlation is less than 1). To demonstrate external validity in its strongest form, you would need some additional behavioral variables coming from a substantive model. E.g., if a certain physical activity is "difficult" to do in the old age, you would expect it to be done "rarely".