First of all, I second ttnphns recommendation to look at the solution before rotation. Factor analysis as it is implemented in SPSS is a complex procedure with several steps, comparing the result of each of these steps should help you to pinpoint the problem.
Specifically you can run
FACTOR
/VARIABLES <variables>
/MISSING PAIRWISE
/ANALYSIS <variables>
/PRINT CORRELATION
/CRITERIA FACTORS(6) ITERATE(25)
/EXTRACTION ULS
/CRITERIA ITERATE(25)
/ROTATION NOROTATE.
to see the correlation matrix SPSS is using to carry out the factor analysis. Then, in R, prepare the correlation matrix yourself by running
r <- cor(data)
Any discrepancy in the way missing values are handled should be apparent at this stage. Once you have checked that the correlation matrix is the same, you can feed it to the fa function and run your analysis again:
fa.results <- fa(r, nfactors=6, rotate="promax",
scores=TRUE, fm="pa", oblique.scores=FALSE, max.iter=25)
If you still get different results in SPSS and R, the problem is not missing values-related.
Next, you can compare the results of the factor analysis/extraction method itself.
FACTOR
/VARIABLES <variables>
/MISSING PAIRWISE
/ANALYSIS <variables>
/PRINT EXTRACTION
/FORMAT BLANK(.35)
/CRITERIA FACTORS(6) ITERATE(25)
/EXTRACTION ULS
/CRITERIA ITERATE(25)
/ROTATION NOROTATE.
and
fa.results <- fa(r, nfactors=6, rotate="none",
scores=TRUE, fm="pa", oblique.scores=FALSE, max.iter=25)
Again, compare the factor matrices/communalities/sum of squared loadings. Here you can expect some tiny differences but certainly not of the magnitude you describe. All this would give you a clearer idea of what's going on.
Now, to answer your three questions directly:
- In my experience, it's possible to obtain very similar results, sometimes after spending some time figuring out the different terminologies and fiddling with the parameters. I have had several occasions to run factor analyses in both SPSS and R (typically working in R and then reproducing the analysis in SPSS to share it with colleagues) and always obtained essentially the same results. I would therefore generally not expect large differences, which leads me to suspect the problem might be specific to your data set. I did however quickly try the commands you provided on a data set I had lying around (it's a Likert scale) and the differences were in fact bigger than I am used to but not as big as those you describe. (I might update my answer if I get more time to play with this.)
- Most of the time, people interpret the sum of squared loadings after rotation as the “proportion of variance explained” by each factor but this is not meaningful following an oblique rotation (which is why it is not reported at all in psych and SPSS only reports the eigenvalues in this case – there is even a little footnote about it in the output). The initial eigenvalues are computed before any factor extraction. Obviously, they don't tell you anything about the proportion of variance explained by your factors and are not really “sum of squared loadings” either (they are often used to decide on the number of factors to retain). SPSS “Extraction Sums of Squared Loadings” should however match the “SS loadings” provided by psych.
- This is a wild guess at this stage but have you checked if the factor extraction procedure converged in 25 iterations? If the rotation fails to converge, SPSS does not output any pattern/structure matrix and you can't miss it but if the extraction fails to converge, the last factor matrix is displayed nonetheless and SPSS blissfully continues with the rotation. You would however see a note “a. Attempted to extract 6 factors. More than 25 iterations required. (Convergence=XXX). Extraction was terminated.” If the convergence value is small (something like .005, the default stopping condition being “less than .0001”), it would still not account for the discrepancies you report but if it is really large there is something pathological about your data.
Firstly, principal components and factor analysis are quite different methods. PCA is normally used more as a data reduction technique, while factor analysis is more concerned with finding a latent structure.
On the cross loadings, the oblique rotation allows the factors to be correlated, but typically one would not want items to load on multiple factors. In this case, I would probably examine the factor loadings using other oblique rotations such as oblimin to see if these cross-loadings still appear.
Cross loadings of below .3 are often ignored, but if you have multiple samples with the same cross-loadings, then this may be an indication that the item is indeed associated with more than one factor. Typically, these items are discarded, and I would probably do so unless you have a strong theoretical or practical rationale for retaining them.
Finally, it sounds like you have two samples. In this case, I would perform EFA on your first sample, and then use the second sample to validate your model. This will raise the probability that you are modelling something real, rather than noise.
Best Answer
Let me recommend you first to read this Q/A. It is about rotations and can hint towards or partly answer your question.
A more specific answer from me about interpretation might be as follows. Theoretically, factor of Factor analysis is a univariate latent feature, or essence. It is not the same thing as a set or cluster of phenomena. Term "construct" in psychometry is generic and could be conceptualized as factor (essence) or cluster (prototype) or something else. Since factor is univariate essence it should be interpreted as the (relatively simple) meaning lying on (or "behind") the intersection of the meanings/contents of the variables loaded by the factor.
With oblique rotation, factors are not orthogonal; still, we usually prefer to interpret a factor as clean entity from the other factors. That is, ideally, factor X label would dissociate from a correlated factor Y label, to stress individuality of both factors, while assuming that "in outer reality" they correlate. Correlatedness thus gets to be an isolated characteristic of entities from the labels of the entities.
If it is this the strategy typically preferred then pattern matrix appears to be the main tool for interpretation. Coefficients of pattern matrix are the unique loads or investments of the given factor into variables. Because it is regression coefficients$^1$. [I insist that it is better to say "factor loads variable" than "variable loads factor".] Structure matrix contains (zero-order) correlations between factors and variables. The more two factors X and Y correlate with each other the greater can be the discrepancy between the pattern loadings and the structure loadings on some variable V. While V ought to correlate higher and higher with both factors, the regression coefficients can rise both or only one of the two. The latter case will mean that it is that part of X which is different from Y what loads V so much; and thence the V-X pattern coefficient is what is highly valuable in interpretation of X.
Weak side of pattern matrix is that it is less stable from sample to sample (as usually regression coefficients in comparison to correlation coefficients). Relying on pattern matrix in interpretation requires well planned study with sufficient sample size. For pilot study and tentative interpretation structure matrix might be better choice.
Structure matrix seems to me potentially better than pattern matrix in back interpretation of variables by factors, if such a task arises. And it can rise when we validate items in questionnaire construction, - that is, decide which variables to select and which to drop in the scale being created. Just remember that in psychometry common validity coefficient is correlation (and not regression) coefficient between construct/criterion and item. Usually I include an item in a scale this way: (1) look at maximal correlation (structure matrix) in the item's row; (2) if the value is above a threshold (say, .40), select the item if its situation in pattern matrix confirms the decision (i.e. the item is loaded by the factor - and desirably only by this one - which scale we're constructing). Also factor scores coefficient matrix is what is useful in addition to pattern and structure loadings in the job of selection items for a factor construct.
If you do not perceive a construct as univariate trait then using classic factor analysis would be questioned. Factor is thin and sleek, it is not like pangolin or armful of whatever. Variable loaded by it is its mask: factor in it shows through what appears to be completely not that factor in it.
$^1$ Pattern loadings are the regression coefficients of the factor model equation. It the model, the being predicted variable is meant either standardized (in a FA of correlations) or centered (in a FA of covariances) observed feature, while the factors are meant standardized (with variance 1) latent features. Coefficients of that linear combination are the pattern matrix values. As comes clear from pictures below here - pattern coefficients are never greater than structure coefficients which are correlations or covariances between the being predicted variable and the standardized factors.
Some geometry. Loadings are coordinates of variables (as their vector endpoints) in the factor space. We use to encounter those on "loading plots" and "biplots". See formulas.
Left. Without rotation or with orthogonal rotation, axes (factors) are geometrically orthogonal (as well as statistically uncorrelated) to each other. The only coordinates possible are square like what are shown. That is what is called "factor loading matrix" values.
Right. After oblique rotation factors are no longer orthogonal (and statistically they are correlated). Here two types of coordinates can be drawn: perpendicular (and that are structure values, correlations) and skew (or, to coin a word, "alloparallel": and that are pattern values, regression weights).
In math jargon, perpendicular coordinates are called covariant ones and skew coordinates are called contravariant ones.
Of course, it is possible to plot pattern or structure coordinates while forcing the axes to be geometrically orthogonal on the plot - it is what when you take the table of the loadings (pattern or structure) and give to your software to build a standard scatterplot of those, - but then the angle between the variable vectors will appear widened. And so it will be a distorted loading plot, since the aforesaid original angle was the correlation coefficient between the variables.
See detailed explanation of a loading plot (in settings of orthogonal factors) here.