I am (very) new to this, but I'll do my best to help. The answers to your questions are
Am I justified in removing the other 8 principal components?
I do not think you are "justified". But if you want to make a first coarse assessment of the data you can concentrate on the first PC, just bear in mind that you neglect 9% of the total variability. This leads you to ask many other questions: were the variables expected to be so strongly correlated? Could you simulate or explain this 9% extra variability simply by invoking measurement errors?
How do I interpret 91% of explained variance on one component?
You interpret it with a very high degree of correlation between the many variables you included, or between at least two variables while the others show a much smaller dispersion. When you look at the PC components in terms of original measurements, how many significant components do you have?
If I only kept one component what would be the best way to visualize the data?
If you only kept one component your final description of the data would be 1D, so an axis would do the job. I repeat myself, and please do not take my words as patronizing, but I would try to understand if the PC you calculated makes sense given the data.
The two citations do not generally contradict each other and both look to me correct. The only underwork is in Perhaps you mean sum of squared loadings for a principal component, after rotation
one should better drop word "principal" since rotated components or factors are not "principal" anymore, to be rigorous. Also (important!) the second citation is correct only when "factor analysis" is actually PCA method (like it is in SPSS by default) and so factors are just principal components. But the table you present is not after PCA, and I wonder whether they are from the same text and wasn't there some misprint.
In the extraction summary table you display there was 23 variables analyzed. Eigenvalues of their correlation matrix are shown in the left section "Initial eigenvalues". No factors have been extracted yet. These eigenvalues correspond to the variances of Principal components (i.e. PCA was performed), not of factors. Adjective "initial" means "at the initiation point of the analysis" and does not imply that there must be some "final" eigenvalues.
The (default in SPSS) Kaiser rule "eigenvalues>1" was used to decide how many factors to extract, so, 4 factors will come. The "eigenvalues>1" rule is based on PCA's eigenvalues (i.e. the eigenvalues of the intact, input correlation matrix).
Extraction of them was done by Principal axis method and the matrix of loadings obtained. Sums of squared loadings in the matrix columns are the factors' variances after extraction. These values appear in the middle section of your table.
These numbers, generally, should not be called eigenvalues because factor extractions not necessarily are based right on the eigendecomposition of the input data - they are specific algorithms on their own. Even Principal axis method which does involve eigenvalues deal with eigenvalues of a repeatedly "trained" matrix, not an original correlation matrix.
But if you had been doing PCA instead of FA then the 4 numbers in the middle column would have been the 4 first eigenvalues identical to the 4 largest ones on the left: in PCA, no fitting take place and the extracted "latent variables" are the PCs themselves, which eigenvalues are their variances.
In the right section, sums of squared loading after rotaion of the factors are shown. The variances of these new, rotated factors. Please read more about rotated factors (or components), especially footnote 4, and that they are neither "principal" anymore nor this-one-to-that-one correspondent to the extracted ones. After rotation, "2nd" factor, for example, is not "2nd extracted factor, rotated". And it also could have greater variance than the "1st" one.
So,
- No, you can't speak of eigenvalues after rotation. No matter be it
orthogonal or oblique.
- You can't even say - at least should better
avoid - of eigenvalues even after extraction of factors unless
these factors are principal components$^1$. (An instructive example showing confusion similar to yours with ML factor extraction.) Variances of factors are SS loadings, not eigenvalues, generally.
- Rotated factors don't correspond one-to-one to the extracted ones.
- The % of total variation
explained by the factors is 40.477% in your example, not 50.317%.
The first number is less because FA factors explain all the assumed
common variation which is less than the portion of total
variation skimmed by the same number of PCs. May say in your report, "The 4-factor solution is responsible for the common variance constituting 40.5% of the total variance; while 4 principal components would account for 50.3% of the total variance".
$^1$ (Before factor rotation) variances of factors (pr. components) are the eigenvalues of the correlation/covariance matrix of the data if the FA is PCA method; variances of factors are the eigenvalues of the reduced correlation/covariance matrix with final communalities on the diagonal, if the FA is PAF method of extraction; variances of factors do not correspond to eigenvalues of correlation/covariance matrix in other FA methods such as ML, ULS, GLS (see). In all cases, variances of orthogonal factors are the SS of the extracted/rotated - final - loadings.
Best Answer
Barring numerical issues, all the eigenvalues should be non-negative (since covariance matrices are positive (semi-)definite). So no need to use absolute value anywhere really. In some cases,
i
could be near zero though (for small principal values), and end up negative (due to numerics). So I suppose this protects you against that. See e.g. here.The equation is computing $ S_i = |\lambda_i|/\sum_i \lambda_i $, where $S$ is the significance (meaning the percent of variance explained here). Note that the total variance equals $\mathbb{V}[D] = \sum_i \lambda_i $. See e.g. here. Thus $S_i$ is exactly the proportion of total variance explained/covered by the axis defined by the $i$th principal component.
Both, since $\lambda_i \geq 0$.
Edit in response to comments (082519):
We need a bit of math to disentangle everything I think. Also note that not everyone uses the same notation unfortunately. Let $X = D - \mu$ be the centered data matrix ($X,D\in \mathbb{R}^{n\times m}$) where $n$ is the number of data points (rows) and $m$ is the number of features (columns), and $\mu$ is the columnwise means of $D$. Then the covariance matrix $C\in\mathbb{R}^{m\times m}$ can be written $$ C = \frac{1}{n-1}X^TX = V\Lambda V^T $$ where $\Lambda = \text{diag}(\lambda_1,\ldots,\lambda_m)$ are the eigenvalues and the columns $v_i$ of $V$ are the eigenvectors of $C$.
This can also be computed with the SVD: $X = U\Sigma V^T$, where the singular values $\Sigma=\text{diag}(\sigma_1,\ldots,\sigma_m)$ are related to the covariance eigenvalues via $\lambda_\ell = \sigma^2_\ell/(n-1)$.
Anyway, the eigendecomposition of $C$ gives us a new space in which to place the data: the principal components are the columns of $V$ and the axes of the new space. So $v_1$ is the first principal component and also the first axis of the new space. The associated eigenvalue $\lambda_1$ is closely related to the significance/importance (it can be used to compute the explained variance).
The loadings are generally defined as the eigenvectors (principal components $v_i$) scaled by their standard deviations per new dimension. (Recall that the $v_i$ are unit-length vectors defining the new data space.) In other words: $$ L_i = \sqrt{\lambda_i}\, v_i $$ However, I'm not a stats person so I don't tend to use loadings much (maybe see here for more).
Ok, for the comments:
Yeah, it's exactly the $S_i$ I defined above.
Yes, $v_1$ where $\lambda_1 = \max_j \lambda_j$ is the first PC.
The eigenvalues $\lambda_k$ are indeed all positive. But the values you're talking about seem to be $v_{ij}$ for different $j$s. That is, for the $i$the PC, the $j$th component ($v_{ij}$) is indeed exactly the contribution of the $j$th feature to the $i$th PC. The loading value $L_{ij}=v_{ij} \sqrt{\lambda_i}$ is the same, just scaled by the importance. In other words, the loadings are elements of the eigenvectors, not the eigenvalues. But, they do indeed represent the feature contributions to each PC. Of course, if a feature contributes negatives, the loading values $L_{ij}$ can be negative. On the other hand, the eigenvalues (variances) pertain to the "importance" of the PCs, not the features, and should be non-negative.