CFA Fit Interpretation – How to Interpret Confirmatory Factor Analysis Outcomes in SEM

confirmatory-factorlavaanscalesstructural-equation-modeling

I did some CFA analyses (using R package Lavaan) on several scales in order to check the unidimensionalities. If I understood well scale 5, 6 and 7 can be considered a good fit because of the RMSEA < 0.08 and the CFI and TLI > 0.90. My question is how to interpret the fit of the first 4 scales. The RMSEA looks good, but the CFI and TLI don't. Am I allowed to say something like "almost a good fit"?

Scale   N       R2      χ2          df  SRMR    RMSEA   RMSEA con.interv.   CFI     TLI
1       1673    0.18    1470.71***  434 0.038   0.038   0.036   -   0.040   0.85    0.83
2       1672    0.19    597.81***   152 0.04    0.042   0.038   -   0.045   0.87    0.85
3       1675    0.16    586.93***   170 0.038   0.038   0.035   -   0.042   0.84    0.82
4       1677    0.25    427.43***   90  0.04    0.047   0.043   -   0.052   0.91    0.89
5       1677    0.24    280.65***   90  0.031   0.036   0.031   -   0.040   0.93    0.92
6       1670    0.26    175.35      54  0.03    0.037   0.031   -   0.043   0.95    0.93
7       1679    0.25    289.79***   104 0.03    0.033   0.028   -   0.037   0.95    0.94

Best Answer

A CFI of 0.9 is generally considered to not be very good (nowadays?). So saying that a CFI that is below 0.9 is "almost a good fit" is (IMHO) stretching the truth somewhat.

So why do you have good RMSEA and poor CFI? It's because the two indices test fit in different ways. RMSEA is based on chi-square - lower chi-square means lower RMSEA. The CFI tests fit by comparing with the null model. If your variables are not highly correlated, then your null model doesn't have such bad fit. If your null model doesn't have bad fit, it's hard for the fit to be much better. But that means that there isn't a lot of covariance to be explained by the model, hence a good RMSEA.

The poor CFI and good RMSEA means that you have poor data. Essentially, if your variables are not reliable, your RMSEA will be better, your CFI will be worse.

Here are a couple of papers that might be helpful:

http://www.tandfonline.com/doi/abs/10.1080/10705519609540052 https://www.researchgate.net/publication/221986550_A_time_and_a_place_for_incremental_fit_indices

Related Solutions

Solved – When would one pre-specify thresholds in SEM/CFA for limited dependent variables

The thresholds are on a logit scale, so it's the log-odds (just like in logistic regression, or ordinal logistic regression). The thresholds are just like the intercept in a regular regression model. They give the expected log odds of a value, given that the predictors (including latents) are equal to zero.

If in a model with continuous variables, you constrained the intercept of a variable to be some incorrect value, then this will mess up your model fit, but it shouldn't change the rest of the model.

Most of the time you can ignore the thresholds. You sometimes need to constrain one of them for identification purposes. For example, to identify the mean of the latent variable, you can either constrain the mean to zero, or constrain a threshold of an indicator (just like constraining one of the loadings to identify the variance). The default in Lavaan is to constrain the mean / intercept of the latent to be equal to zero.

You constrain thresholds if you want to make the means/intercepts of latent variables comparable. E.g. if you have a multiple group model and want to compare the means of the latents, the default is to fix both means to zero. that doesn't make sense, so instead you constrain one of the means to zero, and constrain the thresholds to be equal across groups.

Solved – CFA lavaan – reasons for bad fit

This is a studied issue. Structures with seemingly good measurement quality are rejected using the standard measures of fit in a CFA. See McNeish, An & Hancock (2017) below. If I correctly recall, they suggest recalibrating our expectations with goodness of fit statistics.

One suggestion that has no bearing on your problem: drop FC. One-indicator factors are a bad idea for many many reasons. Just drop it and settle for a two-factor structure. Or leave C1 as an observed variable.

A good thing is that your inter-factor correlations are not too high, suggesting that the factors may indeed be distinguishable. Another thing that is worth noting is that your SRMR is low, suggesting that on average, you may not be doing too badly capturing the sample variance-covariance matrix with your model implied variance-covariance matrix. I find it to be the least deceitful global fit index.

When faced with a situation like this, I think the most natural approach is to estimate permit all items to load on all factors, except for a few items. A good reference is Ferrando & Lorenzo-Seva (2000). Since you have five items per factor, you can select two items per factor that you are confident load on a given factor. They act as markers for the factor. Set their loadings to 0 on the other factor. Then estimate all other loadings freely. The hope with this approach is that the pattern of loadings for the other three items per factor follows as expected from theory. The items load highly on the factor you think they should and lowly on the other factor. The marker items should also load highly on the factor you restricted them to.

This way, you permit cross-loadings (which always exist in reality), and you have the freedom to use common sense to judge whether the structure matches your theory, as in an EFA. And you still get tests of model fit. I do not know why this approach is not more popular.

In your example, assuming I select items A1 and A2 to be markers for FA and B1 and B2 to be markers for FB, then the lavaan syntax for the model I am describing would be something like:

"
FA =~ A1 + A2 + A3 + A4 + A5 + B3 + B4 + B5
FB =~ A3 + A4 + A5 + B1 + B2 + B3 + B4 + B5
"

If the resulting pattern of factor loadings matches your theory, that is a good sign. You can use congruence (cosine similarity) to evaluate how well the resulting structure matches the perfect structure of no cross-loadings in your original CFA. Ferrando and Lorenzo-Seva talk about this and the R function, cosine in the lsa package, computes congruence.

If model fit remains poor, then it is important to turn to investigative work. My preferred framework would be that of Saris, Satorra & van der Veld (2009). They use a combination of modification indices and power and judgement to evaluate local misspecification rather than global misspecification. I wrote about it here: Misspecification and fit indices in covariance-based SEM. It is also implemented in lavaan.

The general idea is that if you have enough data, your model will always be misspecified since all models are wrong, and then your global fit indices will be bad. But not all misspecifications matter. So you investigate each misspecification, evaluate its importance, then choose to either modify your model suggesting a lapse in your original theory and at the same time generating a new theory, that you will have to confirm on some new dataset.

I hope this helps

Works Cited

McNeish, D., An, J., & Hancock, G. R. (2017). The thorny relation between measurement quality and fit index cutoffs in latent variable models. Journal of Personality Assessment. https://doi.org/10.1080/00223891.2017.1281286
Ferrando, P. J., & Lorenzo-Seva, U. (2000). Unrestricted versus restricted factor analysis of multidimensional test items: some aspects of the problem and some suggestions. Psicológica, 21(2), 301–323. Retrieved from http://www.redalyc.org/pdf/169/16921206.pdf
Saris, W. E., Satorra, A., & van der Veld, W. M. (2009). Testing structural equation models or detection of misspecifications? Structural Equation Modeling: A Multidisciplinary Journal, 16(4), 561–582. https://doi.org/10.1080/10705510903203433

Best Answer

Related Solutions

Solved – When would one pre-specify thresholds in SEM/CFA for limited dependent variables

Solved – CFA lavaan – reasons for bad fit

Related Question