Solved – Feasible to do hierarchical CFA with only two first-order factors

factor analysisstructural-equation-modeling

Kline (2011) p249 writes:

To identify a hierarchical CFA model, there must be at least three
first-order factors. Otherwise, the direct effects of the second-order
factor on the first-order factors or the disturbance variances may be
underidentified. Each first-order factor should have at least two
indicators.

  1. Kline presents this as a hard and fast rule. Are there any exceptions to this rule that could apply in a practical scenario (even if those exceptions are a bit farfetched)?

  2. Why are 3+ first-order factors required? If we forget about hierarchical factor analysis for a moment, I know that in a regular factor analysis having two indicators per factor and two factors is a technical minimum. That is, with two indicators per factor and two factors we might have with empirical underidentification or nonconvergence of iterative estimation, but usually it more or less works. I figured that since the second-order factor has no indicators, the first-order factors were in a way acting like indicators for the second-order factor. So maybe a model with one second-order factor and two first-order factors was kinda like a factor analysis model with two indicators and only one factor, which doesn't work. Is that an appropriate analogy for why it doesn't work?

  3. What should I do in a situation in which I 'want' to test a hierarchical factor model with a single second-order factor only two first-order factors? Are such models inherently untestable, or is there some way around this?

Kline, R. B. (2011). Principles and practice of structural equation modeling. Guilford publications.

Best Answer

It's not really a hard and fast rule, it's just that if you only have two first order factors, your model will be equivalent to a model where they are correlated. Your point 2 is correct.

It's just like if you have two measured variables, and want to fit a single factor model. If you have two variables (latent or measured), you have one covariance / correlation to account for - so you have only got 1 degree of freedom.

Say that these two variables are correlated 0.64. You want to estimate two loadings, but you only have 1 df, so you can only estimate one parameter. The way to fix that is to constrain the two loadings to be equal. If you do that, the loadings will be sqrt(0.64) = 0.8.

So (to address point 3) you can fit the model, but it won't tell you anything you didn't already know, and you won't learn anything from it.

(I'd argue that three first order factors doesn't tell you something - the factor loadings are a transformation of the factor covariance matrix, and are completely determined - you need four factors to over-identify the second order factor).