Solved – What’s the difference between a component and a factor in parallel analysis

factor analysisparallel-analysispcar

The psych package in R has a fa.parallel function to help determine the number of factors or components. From the documentation:

One way to determine the number of factors or components in a data
matrix or a correlation matrix is to examine the “scree" plot of the
successive eigenvalues. Sharp breaks in the plot suggest the
appropriate number of components or factors to extract. “Parallel"
analyis is an alternative technique that compares the scree of factors
of the observed data with that of a random data matrix of the same
size as the original. fa.parallel.poly does this for tetrachoric or
polychoric analyses.

When I run the function I get the following output:

Parallel analysis suggests that the number of factors =  7  
                            and the number of components =  4

What is the difference between a factor and a component?

Best Answer

You might wish to read Dinno's Gently Clarifying the Application of Horn’s Parallel Analysis to Principal Component Analysis Versus Factor Analysis. Here's a short distillation:

Principal component analysis (PCA) involves the eigen-decomposition of the correlation matrix $\mathbf{R}$ (or less commonly, the covariance matrix $\mathbf{\Sigma}$), to give eigenvectors (which are generally what the substantive interpretation of PCA is about), and eigenvalues, $\mathbf{\Lambda}$ (which are what the empirical retention decisions, like parallel analysis, are based on).

Common factor analysis (FA) involves the eigen-decomposition of the correlation matrix $\mathbf{R}$ with the diagonal elements replaced with the communalities: $\mathbf{C} = \mathbf{R} - \text{diag}(\mathbf{R}^{+})^{+}$, where $\mathbf{R}^{+}$ indicates the generalized inverse (aka Moore-Penrose inverse, or pseudo-inverse) of matrix $\mathbf{R}$, to also give eigenvectors (which are also generally what the substantive interpretation of FA is about), and eigenvalues, $\mathbf{\Lambda}$ (which, as with PCA, are what the empirical retention decisions, like parallel analysis, are based on).

The eigenvalues, $\mathbf{\Lambda} = \{\lambda_{1}, \dots, \lambda_{p}\}$ ($p$ equals the number of variables producing $\mathbf{R}$) are arranged from largest to smallest, and in a PCA based on $\mathbf{R}$ are interpreted as apportioning $p$ units of total variance under an assumption that each observed variable contributes 1 unit to the total variance. When PCA is based on $\mathbf{\Sigma}$, then each eigenvalue, $\lambda$, is interpreted as apportioning $\text{trace}(\mathbf{\Sigma})$ units of total variance under the assumption that each variable contributes the magnitude of its variance to total variance. In FA, the eigenvalues are interpreted as apportioning $< p$ units of common variance; this interpretation is problematic because eigenvalues in FA can be negative and it is difficult to know how to interpret such values either in terms of apportionment, or in terms of variance.

The parallel analysis procedure involves:

  1. Obtaining $\{\lambda_{1}, \dots, \lambda_{p}\}$ for the observed data, $\mathbf{X}$.
  2. Obtaining $\{\lambda^{r}_{1}, \dots, \lambda^{r}_{p}\}$ for uncorrelated (random) data of the same $n$ and $p$ as $\mathbf{X}$.
  3. Repeating step 2 many times, say $k$ number of times.
  4. Averaging each eigenvalue from step 3 over $k$ to produce $\{\overline{\lambda}^{r}_{1}, \dots, \overline{\lambda}^{r}_{p}\}$.
  5. Retaining those $q$ components or common factors where $\lambda_{q} > \overline{\lambda}^{r}_{q}$

Monte Carlo parallel analysis employs a high centile (e.g. the 95$^{\text{th}}$) rather than the mean in step 4.

Related Question