Solved – Differences on exploratory factor analysis, confirmatory factor analysis and principal component analysis

confirmatory-factorfactor analysispca

Before it is pointed, I am aware that a very similar question was already asked. Still, I am in doubt regarding the concept.

More specifically, it is mentioned by the most voted answer that:

In terms of a simple rule of thumb, I'd suggest that you:

  1. Run factor analysis if you assume or wish to test a theoretical model of latent factors causing observed variables.

  2. Run principal components analysis If you want to simply reduce your correlated observed variables to a smaller set of important independent composite variables.

Question 1:

I am having difficulties on understanding based on the results I obtained from R where exactly I am inputing my theoretical model of latent factors. I am using the functions from statsmethods. On both factanal() and princomp() the inputs were the same: A table where each row represented one data point and the columns consisted of different attributes I was interested on reducing. Thus, this add to my confusion on where is this pre assumed model play its role. I noticed that for factor analysis function I used parallel analysis also suggested by the site using the nScree() function to determine the number of factors and I specified if I wanted a varimax (orthogonal) or promax (oblique) rotation. Is that what is it mean by the model? Being able to choose the amount of factors and the type of rotation?

The results being provided as visual graphs for both PCA and EFA also doesn't seem to highlight this difference which adds to my confusion. Where does this distinction can be observed on them?

PCA
PCA

EFA
EFA

Question 2: — Answered

I bought a book to study about this from Richard L. Gorsuch. On this book there is something that the author caught attention on the difference between PCA (Principal Component Analysis) and EFA (Exploratory Factor Analysis): It is mentioned that PCA is for population while EFA is for sample. Is that true? I didn't see that being mentioned on any discussion I read so far. Is it irrelevant?

Question 3:

I noticed that all those methods seems to impose the normal distribution constraint. I also read that for larger sets this constraint can be ignored. Is that true or PCA, EFA and CFA are sensible to distribution constraint violations?

Question 4: Where from the results of PCA and EFA should I note that one is talking about latent factors (EFA) and the other is just clustering on components (factors) the variables? The outputs from R looks the same to me. Is it just the way I perceive what the factors being shown as output? I noted that both show me the table where I can see which I can observe which of my variables are expressed the most of my factors. What is the difference on the interpretation I should have on which variable belongs to which factor in respect to PCA and EFA? EFA is saying those with higher expression seems to be more explained by that latent factor while PCA is trying to say that factor is holding those variables from what is it observed?

Question 5
Finally the last question is regarding CFA (Confirmatory Factor Analysis).

On the same function website the following image is being shown:

Confirmatory Factor Analysis

I read that CFA is usually followed after EFA for hypothesis testing. In that sense, EFA tells you which are the latent factors (which are the output factors) and then you use CFA assuming those factors you observed from EFA for hypothesis testing?

Question 6

For EFA one of the available rotations on the literature is direct oblimium. I heard that it can accounts for both promax and varimax so 'it takes the best of two words'. Is that true? I am also trying to find a function that employs them on R, since the one suggested on the site does not. I would be happy to get any suggestion on this one.


I hope it is noted that this question is way more specific on the doubts regarding EFA and PCA and also adds to CFA so not to get closed for being repeated on the subject. If at least one of the questions is answered I am more than happy too as to clarify the confusion in my head.

Thank you.

Best Answer

I will just address question 2. I have some doubts about how well the author knows his subject if he really said it the way you have presented it. PCA is applied to the sample just like EFA and CFA. It simply takes a list of n possibly related factors looks at how the points (samples) scatter in n-dimension space and then gets the first principal component as the linear combination that explains more of the variability in the data than any other linear combination. Then the second looks at orthogonal directions to the first to find theone out of those that explains the most of the remaining variability and so on with the 3rd and 4th. So sometimes one can take just 1-3 components to describe most of the variation in the data. That is why factor analysis and principal componet analysis are described according to 1 and 2 in your statement.