Solved – EFA: Can I remove/drop variables with non significant loadings and re-run the EFA

factor analysis

I am employing EFA to 56 items.
However, there were cross-loadings occurred and, therefore decision to drop the items is made. The question:
The rotated components matrix showed there were a few items with no-significant loadings in any of the components, so, should I remove/drop the non-loading items and re-run the EFA until all the items were loaded into the respective component?

Thanks

Best Answer

If you are willing to accept a connection between EFA and PCA, then the answer is no.

You are describing an approach called simple thresholding. This works by setting all loadings with absolute value smaller than a threshold to zero, and then you can drop some variables. Cadima and Jolliffe (1995) noticed that this method can be potentially misleading. For example, one should alsolook at the standard deviations of variables to determine the contribution of a variable toa certain PC. Furthermore, if you drop those variables and completely re-estimate, you tend to overestimate the loadings of the variables you kept, giving you worse forecasting error (still of interest to you because that is a good proxy for how good the model really is).

For PCA, two approaches are SCoTLASS (Jolliffe et al. (2003)) and SPCA (Hastie and Tibshirani (2004)). These use variable selection approaches similar to the LASSO approach. You can run SPCA in R using the elasticnet package, or in MATLAB using this toolbox by Karl Sjöstrand.

After applying one of these methods, you potentially still obtain a useful interpretation from the PCA loadings. PCA is one way to perform Factor Analysis, so hopefully it is not too unpalatable. Some discussion the difference between FA and PCA, and equivalences, can be found here.

Rotations can potentially confuse the sparsity the methods I mentioned give. However, if you estimate your initial loadings this way, I think you can probably then exclude the useless variables and rotate for FA. The point being that you will have already estimated the loadings in a way that avoids overestimation.

Related Question