Solved – Correlation of latent variables: Sum-scores vs. SEM correlation

confirmatory-factorfactor analysisstructural-equation-modeling

I use a set of about 20 attitudinal items and confirmatory factor analysis (CFA). Loadings and model for are sufficient. In the next step, I want to test for correlations between these latent factors. I calculate factor scores based on the CFA. And here comes my question: If I calculate correlations between these factor scores, correlations are quite high (up to .7). If I calculate sum scores (adding up the items) and correlate these scores I get only a medium correlation. How does this come? I am aware that the latent factor in the CFA are weighted while the sum scores are not. However, I do not think that this can be the reason for the different correlations.

General framework: N > 8000, Likert scale, attitude items.

This is a similar topic, but does not on the possible differences between the two methods.
Correlational study or ordinal data using 5-point Likert scale

edited to make the question more clear

Best Answer

I assume that you are thinking of a simple structure in which each of the 20 items loads on exactly 1 factor. Suppose items 1-10 load on factor 1, and 11-20 load on factor 2. Then you could average items 1-10, average items 11-20 for each individual and calculate their correlation. Alternatively, you can estimate factor scores for the factors and obtain an estimate of the correlation that way. If you do a CFA, allowing the correlation between factors to be free, the software will estimate that parameter for you. Is this your question?

So yes ... these two statistics will be different. You can think of each item as being a noisy estimator of factor 1 or factor 2 (as appropriate). Taking the average will reduce the noise, but you still have noisy observations. Adding noise to a pair of variables reduces their correlation, so the first statistic will be biased downwards as an estimate of the correlation you seek. This is true even if the factor loadings are the same.

In confirmatory factor analysis, you estimate the various components of the model (uniqueness variances, loadings, factor covariances) through maximum likelihood (or some other method), so you end up actually estimating the parameter of interest (the factor correlation).

As a bonus, you can still get the covariance of the factors in a more complex model, where items load on more than 1 factor. Sum scores would totally not work in that case, but the covariance of the factors will emerge from the optimization.