What is the rationale of applying an exploratory/unsupervised method (PCA or FA with VARIMAX rotation) after having tested a confirmatory model, especially if this is done on the same sample?
In your CFA model, you impose constraints on your pattern matrix, e.g. some items are supposed to load on one factor but not on the others. A large modification index indicates that freeing a parameter or removing an equality constraint could result in better model fit. Item loadings are already available through your model fit.
On the contrary, in PCA or FA there is no such constraint, even following an orthogonal rotation (whose purpose is just to make factor more interpretable in that items would generally tend to load more heavily on a factor than on several ones). But, it is worth noting that these models are conceptually and mathematically different: the FA model is a measurement model, where we assume that there is some unique error attached to each item; this is not the case under the PCA framework. It is thus not surprising that you failed to replicate your factor structure, which may be an indication that there are possible item cross-loading, low item reliability, low stability in your factor structure, or the existence of a higher-order factor structure, that is enhanced by your low sample size.
In both case, but especially CFA, $N=96$ is a very limited sample size. Although some authors have suggested a ratio individuals:items of 5 to 10, this is merely the number of dimensions that is important. In your case, the estimation of your parameters will be noisy, and in the case of PCA you may expect fluctuations in your estimated loadings (just try bootstrap to get an idea of 95% CIs).
I will just address question 2. I have some doubts about how well the author knows his subject if he really said it the way you have presented it. PCA is applied to the sample just like EFA and CFA. It simply takes a list of n possibly related factors looks at how the points (samples) scatter in n-dimension space and then gets the first principal component as the linear combination that explains more of the variability in the data than any other linear combination. Then the second looks at orthogonal directions to the first to find theone out of those that explains the most of the remaining variability and so on with the 3rd and 4th. So sometimes one can take just 1-3 components to describe most of the variation in the data. That is why factor analysis and principal componet analysis are described according to 1 and 2 in your statement.
Best Answer
If you include all components, not just ones that have an eigenvalue above 1, you will have the same $R^2$. The extracted principal components can just be thought of as linear combinations of the original variables, and these transformed variables need not have any obvious relation to the linear regression with the original variables.
Here is an example in SPSS. So lets make some fake data for 5
X
variables. Four of these variables are highly correlated, but one is uncorrelated.So here
X1
toX4
do have an underlying latent variable, but they each uniquely affectY
in different ways. So when we regress all of theX's
onY
we get close to the population parameters for the estimates - even with many of theX's
highly correlated. The $R^2$ here is only0.168
- but this is the correct specification given my simulation.Now, if we ran the default
FACTORS
command, we would only extract one component using the eigenvalue above 1 rule. This is because I did not generateX5
to be exactly orthogonal to the otherX's
. It ends up being very close to 1 though - and if you looked at the Scree plot you would probably keep it. Here I force SPSS to extract all 5 principle components (pet peeve - it makes no sense to talk about rotation when we are extracting principal components!)Now if we include all 5 of these orthogonal principal components it will equal the same $R^2$ as the original equation,
0.168
, but the coefficients have no relationship to the simulated data. The $R^2$ will never be higher than what was found including the original measurements.If we mindlessly use the eigenvalue over 1 criteria, the $R^2$ decreases to
0.08
.This, unfortunately, is a very common procedure in the social sciences where it isn't necessarily warranted. When you do this, you are basically making a case for a congeneric measurement model where the underlying latent variable is what affects
Y
, and you measure the latent variable using the principal component scores. If the original variables can affectY
in unique ways reducing those variables to their principal component scores is inappropriate. Many times people do it mindlessly just because a few correlations are high - which if you looked at the original regression here the standard errors are small enough that shouldn't be any concern at all.