Solved – Data transformation for Principal Components Analysis from different Likert scales

data transformationlikertpcapsychometricsscales

I have data from a survey comprised of several measures that used different Likert-type scaling (4-, 5-, and 6-point scales). I would like to run a principal components analysis using the data from these measures. It seems to me that I need to transform this data in some way so that the power of all items is equivalent prior to analysis. However, I am uncertain how to proceed.

Best Answer

As suggested by @whuber, you can "abstract" the scale effect by working with a standardized version of your data. If you're willing to accept that an interval scale is the support of each of your item (i.e. the distance between every two response categories would have the same meaning for every respondents), then linear correlations are fine. But you can also compute polychoric correlation to better account for the discretization of a latent variable (see the R package polycor). Of note, it's a largely more computer-intensive job, but it works quite well in R.

Another possibility is to combine optimal scaling within your PCA, as implemented in the homals package. The idea is to find a suitable non-linear transformation of each scale, and this is very nicely described by Jan de Leeuw in the accompagnying vignette or the JSS article, Gifi Methods for Optimal Scaling in R: The Package homals. There are several examples included.

For a more thorough understanding of this approach with any factorial method, see the work of Yoshio Takane in the 80s.

Similar points were raised by @Jeromy and @mbq on related questions, Does it ever make sense to treat categorical data as continuous?, How can I use optimal scaling to scale an ordinal categorical variable?