Solved – Is the first principal component the one with the largest eigenvalue and how to convert it to explained variance

covariance-matrixeigenvaluespca

In PCA, after we calculate the eigenvalues of each variable, we need to get the explained variance, I read an article which suggests:

significance = [np.abs(i)/np.sum(eigenval) for i in eigenval]' 

I have three questions:

  • In this code, why don't we use the sum of the absolute value of 'eigenval'?
  • Why we use 'np.abs(i)/np.sum(eigenval) ' to get the explained variance?
  • Is the first principal component the one with the largest eigenvalue or the largest absolute eigenvalue?

Best Answer

  • Barring numerical issues, all the eigenvalues should be non-negative (since covariance matrices are positive (semi-)definite). So no need to use absolute value anywhere really. In some cases, i could be near zero though (for small principal values), and end up negative (due to numerics). So I suppose this protects you against that. See e.g. here.

  • The equation is computing $ S_i = |\lambda_i|/\sum_i \lambda_i $, where $S$ is the significance (meaning the percent of variance explained here). Note that the total variance equals $\mathbb{V}[D] = \sum_i \lambda_i $. See e.g. here. Thus $S_i$ is exactly the proportion of total variance explained/covered by the axis defined by the $i$th principal component.

  • Both, since $\lambda_i \geq 0$.


Edit in response to comments (082519):

We need a bit of math to disentangle everything I think. Also note that not everyone uses the same notation unfortunately. Let $X = D - \mu$ be the centered data matrix ($X,D\in \mathbb{R}^{n\times m}$) where $n$ is the number of data points (rows) and $m$ is the number of features (columns), and $\mu$ is the columnwise means of $D$. Then the covariance matrix $C\in\mathbb{R}^{m\times m}$ can be written $$ C = \frac{1}{n-1}X^TX = V\Lambda V^T $$ where $\Lambda = \text{diag}(\lambda_1,\ldots,\lambda_m)$ are the eigenvalues and the columns $v_i$ of $V$ are the eigenvectors of $C$.

This can also be computed with the SVD: $X = U\Sigma V^T$, where the singular values $\Sigma=\text{diag}(\sigma_1,\ldots,\sigma_m)$ are related to the covariance eigenvalues via $\lambda_\ell = \sigma^2_\ell/(n-1)$.

Anyway, the eigendecomposition of $C$ gives us a new space in which to place the data: the principal components are the columns of $V$ and the axes of the new space. So $v_1$ is the first principal component and also the first axis of the new space. The associated eigenvalue $\lambda_1$ is closely related to the significance/importance (it can be used to compute the explained variance).

The loadings are generally defined as the eigenvectors (principal components $v_i$) scaled by their standard deviations per new dimension. (Recall that the $v_i$ are unit-length vectors defining the new data space.) In other words: $$ L_i = \sqrt{\lambda_i}\, v_i $$ However, I'm not a stats person so I don't tend to use loadings much (maybe see here for more).

Ok, for the comments:

the percentage of each eigenvalue is the explained variance right?

Yeah, it's exactly the $S_i$ I defined above.

The one with the largest eigenvalue is called PC1, is that correct?

Yes, $v_1$ where $\lambda_1 = \max_j \lambda_j$ is the first PC.

You mentioned that all the eigenvalues should be positive, but we get the `loading' of each PC (the weight which each feature contributes to the PC), we have some values are negative, I wonder what are those 'loadings'?

The eigenvalues $\lambda_k$ are indeed all positive. But the values you're talking about seem to be $v_{ij}$ for different $j$s. That is, for the $i$the PC, the $j$th component ($v_{ij}$) is indeed exactly the contribution of the $j$th feature to the $i$th PC. The loading value $L_{ij}=v_{ij} \sqrt{\lambda_i}$ is the same, just scaled by the importance. In other words, the loadings are elements of the eigenvectors, not the eigenvalues. But, they do indeed represent the feature contributions to each PC. Of course, if a feature contributes negatives, the loading values $L_{ij}$ can be negative. On the other hand, the eigenvalues (variances) pertain to the "importance" of the PCs, not the features, and should be non-negative.