Solved – Subscales (not items) as indicators of latent variables in SEM

scalesstructural-equation-modeling

I have a sample (n=200) that I have collected questionnaire data from. Each participant will complete 5 questionnaires that capture different behaviours and all of my 5 questionnaires include between 10 and 27 items.

I am planning to use SEM to construct a model that investigates whether scores on 4 of my questionnaires (IVs) predict scores on the 5th questionnaire (DV). The IVs and DV will be treated as latent variables in the model, each with a number of indicators attached.

However, because my questionnaires have so many items I want to engage some sort of process of reduction so that I have less indicators in my SEM. I'm wondering how my rationale holds, as I'm new to this.

What I am considering at the moment is conducting an exploratory factor analysis and then confirmatory factor analysis on each of the 5 questionnaires separately to extract how many factors are captured by the items (lets say for example, that magically each questionnaire has 3 underlying factors). I would then sum the scores of the items that are associated with each of the 3 factors separately, so each participant now has 3 scores from each questionnaire rather than the 1 total sum score.

The 3 scores would then form 3 indicators used in the final SEM predictive model. In my predictive SEM model, I would have 5 'latent variables', each loading on 3 indicators.

I guess the central question is: If you have in excess of 10-items associated with a latent variable, is it acceptable to use derived factor scores as indicators in SEM rather than the original items?

Best Answer

Yes, is the short answer.

This is frequently done in my field (psychology). Your model complexity will be too high otherwise, and it will (probably) not converge. You're not testing the assumptions of things like unidimensionality that you'd like to test - but that's OK, it might be the case that you're not even interested in that (or that it's irrelevant - your indicators might be causal).

An approach that's sometimes recommended (it's recommended far more than it's used) is to use your total score as a single indicator, and constrain the loading to be the reliability.

Related Question