Solved – Is it acceptable to have only two (or less) items (variables) loaded by a factor in factor analysis

assumptionsfactor analysisreferencesspss

I have a set of 20 variables that I have put through factor analysis in SPSS. For purposes of the research, I need to develop 6 factors. SPSS has shown that 8 variables (out of 20) have been loaded with low weights or have been loaded equally by several factors, so I have removed them. The remaining 12 variables have been loaded in pairs of 2 in the 6 factors, which is perfect structure — just as I wanted, but now, one of the professors working with me wants me to find justification why (or under what conditions) it is appropriate to keep only 2 items per factor, since it is commonly known that factor analysis is useful with results 3 or more items loaded, per factor.

Can anyone help me out with this issue, preferably with a published reference as well?

Best Answer

Two or three items per factor is a question of identification of your CFA (confirmatory FA) model.

Let us for simplicity assume that the model is identified by setting the variance of each factor to 1. Assume also that there are no correlated measurement errors.

A single factor model with two items has two loadings and two error variances to be estimated = 4 parameters, but there are only 3 non-trivial entries in the variance-covariance matrix, so you don't have enough information to estimate the four parameters that you need.

A single factor model with three items has three loadings and three error variances. The variance-covariance matrix has six entries, and careful analytic examination shows that the model is exactly identified, and you can algebraically express the parameter estimates as functions of the variance-covariance matrix entries. With more items per single factor, you have an overidentified model (more degrees of freedom than parameters), which usually means you are good to go.

With more that one factor, the CFA model is always identified with 3+ items per each factor (because a simple measurement model is identified for each factor, so roughly speaking you can get predictions for each factor and estimate their covariances based on that). However, a CFA with two items per factor is identified provided that each factor has a non-zero covariance with at least one other factor in population. (Otherwise, the factor in question falls out of the system, and a two-item single factor model is not identified.) The proof of identification is rather technical, and requires good understanding of matrix algebra.

Bollen (1989) fully and thoroughly discusses the issues of identification of CFA models in chapter 7. See p. 244 specifically regarding three- and two-indicator rules.