You might wish to read Dinno's Gently Clarifying the Application of Horn’s Parallel Analysis to Principal Component Analysis Versus Factor Analysis. Here's a short distillation:
Principal component analysis (PCA) involves the eigen-decomposition of the correlation matrix $\mathbf{R}$ (or less commonly, the covariance matrix $\mathbf{\Sigma}$), to give eigenvectors (which are generally what the substantive interpretation of PCA is about), and eigenvalues, $\mathbf{\Lambda}$ (which are what the empirical retention decisions, like parallel analysis, are based on).
Common factor analysis (FA) involves the eigen-decomposition of the correlation matrix $\mathbf{R}$ with the diagonal elements replaced with the communalities: $\mathbf{C} = \mathbf{R} - \text{diag}(\mathbf{R}^{+})^{+}$, where $\mathbf{R}^{+}$ indicates the generalized inverse (aka Moore-Penrose inverse, or pseudo-inverse) of matrix $\mathbf{R}$, to also give eigenvectors (which are also generally what the substantive interpretation of FA is about), and eigenvalues, $\mathbf{\Lambda}$ (which, as with PCA, are what the empirical retention decisions, like parallel analysis, are based on).
The eigenvalues, $\mathbf{\Lambda} = \{\lambda_{1}, \dots, \lambda_{p}\}$ ($p$ equals the number of variables producing $\mathbf{R}$) are arranged from largest to smallest, and in a PCA based on $\mathbf{R}$ are interpreted as apportioning $p$ units of
total variance under an assumption that each observed variable contributes 1 unit to the total variance. When PCA is based on $\mathbf{\Sigma}$, then each eigenvalue, $\lambda$, is interpreted as apportioning $\text{trace}(\mathbf{\Sigma})$ units of total variance under the assumption that each variable contributes the magnitude of its variance to total variance. In FA, the eigenvalues are interpreted as apportioning $< p$ units of common variance; this interpretation is problematic because eigenvalues in FA can be negative and it is difficult to know how to interpret such values either in terms of apportionment, or in terms of variance.
The parallel analysis procedure involves:
- Obtaining $\{\lambda_{1}, \dots, \lambda_{p}\}$ for the observed data, $\mathbf{X}$.
- Obtaining $\{\lambda^{r}_{1}, \dots, \lambda^{r}_{p}\}$ for uncorrelated (random) data of the same $n$ and $p$ as $\mathbf{X}$.
- Repeating step 2 many times, say $k$ number of times.
- Averaging each eigenvalue from step 3 over $k$ to produce $\{\overline{\lambda}^{r}_{1}, \dots, \overline{\lambda}^{r}_{p}\}$.
- Retaining those $q$ components or common factors where $\lambda_{q} > \overline{\lambda}^{r}_{q}$
Monte Carlo parallel analysis employs a high centile (e.g. the 95$^{\text{th}}$) rather than the mean in step 4.
There are two equivalent ways to express the parallel analysis criterion. But first I need to take care of a misunderstanding prevalent in the literature.
The Misunderstanding
The so-called Kaiser rule (Kaiser didn't actually like the rule if you read his 1960 paper) eigenvalues greater than one are retained for principal component analysis. Using the so-called Kaiser rule eigenvalues greater than zero are retained for principal factor analysis/common factor anlaysis. This confusion has arisen over the years because several authors have been sloppy about using the label "factor analysis" to describe "principal component analysis," when they are not the same thing.
See Gently Clarifying the Application of Horn’s Parallel Analysis to Principal Component Analysis Versus Factor Analysis for the math of it if you need convincing on this point.
Parallel Analysis Retention Criteria
For principal component analysis based on the correlation matrix of $p$ number of variables, you have several quantities. First you have the observed eigenvalues from an eigendecomposition of the correlation matrix of your data, $\lambda_{1}, \dots, \lambda_{p}$. Second, you have the mean eigenvalues from eigendecompositions of the correlation matrices of "a large number" of random (uncorrelated) data sets of the same $n$ and $p$ as your own, $\bar{\lambda}^{\text{r}}_{1},\dots,\bar{\lambda}^{\text{r}}_{p}$.
Horn also frames his examples in terms of "sampling bias" and estimates this bias for the $q^{\text{th}}$ eigenvalue (for principal component analysis) as $\varepsilon_{q} = \bar{\lambda}^{\text{r}}_{q} - 1$. This bias can then be used to adjust observed eigenvalues thus: $\lambda^{\text{adj}}_{q} = \lambda_{q} - \varepsilon_{q}$
Given these quantities you can express the retention criterion for the $q^{\text{th}}$ observed eigenvalue of a principal component parallel analysis in two mathematically equivalent ways:
$\lambda^{\text{adj}}_{q} \left\{\begin{array}{cc}
> 1 & \text{Retain.} \\\\
\le 1 & \text{Not retain.}
\end{array}\right.$
$\lambda_{q} \left\{\begin{array}{cc}
> \bar{\lambda}^{\text{r}}_{q} & \text{Retain.} \\\\
\le \bar{\lambda}^{\text{r}}_{q} & \text{Not retain.}
\end{array}\right.$
What about for principal factor analysis/common factor analysis? Here we have to bear in mind that the bias is the corresponding mean eigenvalue: $\varepsilon_{q} = \bar{\lambda}^{\text{r}}_{q} - 0 = \bar{\lambda}^{\text{r}}_{q}$ (minus zero because the Kaiser rule for eigendecomposition of the correlation matrix with the diagonal replaced by the communalities is to retain eigenvalues greater than zero). Therefore here $\lambda^{\text{adj}}_{q} = \lambda_{q} - \bar{\lambda}^{\text{r}}_{q}$.
Therefore the retention criteria for principal factor analysis/common factor analysis ought be expressed as:
$\lambda^{\text{adj}}_{q} \left\{\begin{array}{cc}
> 0 & \text{Retain.} \\\\
\le 0 & \text{Not retain.}
\end{array}\right.$
$\lambda_{q} \left\{\begin{array}{cc}
> \bar{\lambda}^{\text{r}}_{q} & \text{Retain.} \\\\
\le \bar{\lambda}^{\text{r}}_{q} & \text{Not retain.}
\end{array}\right.$
Notice that the second form of expressing the retention criterion is consistent for both principal component analysis and common factor analysis (i.e. because the definition of $\lambda^{\text{adj}}_{q}$ changes depending on components/factors, but the second form of retention criterion is not expressed in terms of $\lambda^{\text{adj}}_{q}$).
one more thing...
Both principal component analysis and principal factor analysis/common factor analysis can be based on the covariance matrix rather than the correlation matrix. Because this changes the assumptions/definitions about the total and common variance, only the second forms of the retention criterion ought to be used when basing one's analysis on the covariance matrix.
Best Answer
I think the intuition behind MAP can be grasped by looking at the formula of partial correlation, included in Velicer (1976) paper (equation 11), which I also write here for convenience:
$$r_{ij.y} = \frac{r_{ij}-r_{iy}r_{jy}}{((1-r_{iy}^2)(1-r_{jy}^2))^{1/2}}$$
At the numerator you have the partial covariance between each pair of variables $i$ and $j$; this number will go down as you partial out more components, since you are removing systematic variance. You divide this number to get a normalization, pretty much like you divide covariance by the product of the standard deviations in order to have a correlation coefficient that has value between -1 and +1. This denominator contains the two correlation terms between each of the two variables and the component $y$ that you are removing. These correlation terms will go up as you keep removing components, since components will contain more and more individual variability/noise. This makes the denominator as a whole to go down. So, both the numerator and the denominator go down as you remove more components, the former because partial covariance goes down since you are removing common/systematic variability, the latter because components catch more and more individual variability.
Now, it comes a point at which the denominator starts decreasing faster than the numerator, because you are removing more individual variability than systematic variability in the data. This makes the partial correlation go up. The MAP criterion computes the average of the partial correlations, and tells you to stop when the average partial correlation stops going down and starts going up, i.e. when you are starting the remove more individual variability than common variability in the data.