Solved – EFA: Can I remove/drop variables with non significant loadings and re-run the EFA

factor analysis

I am employing EFA to 56 items.
However, there were cross-loadings occurred and, therefore decision to drop the items is made. The question:
The rotated components matrix showed there were a few items with no-significant loadings in any of the components, so, should I remove/drop the non-loading items and re-run the EFA until all the items were loaded into the respective component?

Thanks

Best Answer

If you are willing to accept a connection between EFA and PCA, then the answer is no.

You are describing an approach called simple thresholding. This works by setting all loadings with absolute value smaller than a threshold to zero, and then you can drop some variables. Cadima and Jolliffe (1995) noticed that this method can be potentially misleading. For example, one should alsolook at the standard deviations of variables to determine the contribution of a variable toa certain PC. Furthermore, if you drop those variables and completely re-estimate, you tend to overestimate the loadings of the variables you kept, giving you worse forecasting error (still of interest to you because that is a good proxy for how good the model really is).

For PCA, two approaches are SCoTLASS (Jolliffe et al. (2003)) and SPCA (Hastie and Tibshirani (2004)). These use variable selection approaches similar to the LASSO approach. You can run SPCA in R using the elasticnet package, or in MATLAB using this toolbox by Karl Sjöstrand.

After applying one of these methods, you potentially still obtain a useful interpretation from the PCA loadings. PCA is one way to perform Factor Analysis, so hopefully it is not too unpalatable. Some discussion the difference between FA and PCA, and equivalences, can be found here.

Rotations can potentially confuse the sparsity the methods I mentioned give. However, if you estimate your initial loadings this way, I think you can probably then exclude the useless variables and rotate for FA. The point being that you will have already estimated the loadings in a way that avoids overestimation.

Related Solutions

Solved – Exploratory factor analysis – promax & factor cross-loadings

Firstly, principal components and factor analysis are quite different methods. PCA is normally used more as a data reduction technique, while factor analysis is more concerned with finding a latent structure.

On the cross loadings, the oblique rotation allows the factors to be correlated, but typically one would not want items to load on multiple factors. In this case, I would probably examine the factor loadings using other oblique rotations such as oblimin to see if these cross-loadings still appear.

Cross loadings of below .3 are often ignored, but if you have multiple samples with the same cross-loadings, then this may be an indication that the item is indeed associated with more than one factor. Typically, these items are discarded, and I would probably do so unless you have a strong theoretical or practical rationale for retaining them.

Finally, it sounds like you have two samples. In this case, I would perform EFA on your first sample, and then use the second sample to validate your model. This will raise the probability that you are modelling something real, rather than noise.

Solved – the relationship between scale reliability measures (Cronbach’s alpha etc.) and component/factor loadings

I am going to add an answer here even though the question was asked a year ago. Most people who are concerned with measurement error will tell you that using factor scores from a CFA is not the best way to move forward. Doing a CFA is good. Estimating factor scores is ok as long as you correct for the amount of measurement error associated with those factor scores in subsequent analyses (a SEM program is the best place to do this).

To get the reliability of the factor score, you need to first calculate the latent construct's reliability from your CFA (or rho):

rho =  Factor score variance/(Factor score variance + Factor score standard
error^2).

Note that the factor score standard error^2 is the error variance of the factor score. This information can be had in MPlus by requesting the PLOT3 output as part of your CFA program.

To calculate overall reliability of the factor score, you use the following formula:

(1-rho)*(FS variance+FS error variance).

The resulting value is the error variance of the factor score. If you were using MPlus for subsequent analyses, you create a latent variable defined by a single item (the factor score) and then specify the factor score's reliability:

LatentF BY FScore@1;
FScore@(calculated reliability value of factor score)

Hope this is helpful! A great resource for this issue are the lecture notes (lecture 11, in particular) from Lesa Hoffman's SEM class at the University of Nebraska, Lincoln. http://www.lesahoffman.com/948/

Best Answer

Related Solutions

Solved – Exploratory factor analysis – promax & factor cross-loadings

Solved – the relationship between scale reliability measures (Cronbach’s alpha etc.) and component/factor loadings

Related Question