R PCA – Difference in Eigenvalues Between ‘princomp’ and ‘prcomp’ Functions

pcar

You can use the decathlon dataset {FactoMineR} to reproduce this. The question is why the computed eigenvalues differ from those of the covariance matrix.

Here are the eigenvalues using princomp:

> library(FactoMineR);data(decathlon)
> pr <- princomp(decathlon[1:10], cor=F)
> pr$sd^2
      Comp.1       Comp.2       Comp.3       Comp.4       Comp.5       Comp.6 
1.348073e+02 2.293556e+01 9.747263e+00 1.117215e+00 3.477705e-01 1.326819e-01 
      Comp.7       Comp.8       Comp.9      Comp.10 
6.208630e-02 4.938498e-02 2.504308e-02 4.908785e-03 

And the same using PCA:

> res<-PCA(decathlon[1:10], scale.unit=FALSE, ncp=5, graph = FALSE)
> res$eig
          eigenvalue percentage of variance cumulative percentage of variance
comp 1  1.348073e+02           79.659589641                          79.65959
comp 2  2.293556e+01           13.552956464                          93.21255
comp 3  9.747263e+00            5.759799777                          98.97235
comp 4  1.117215e+00            0.660178830                          99.63252
comp 5  3.477705e-01            0.205502637                          99.83803
comp 6  1.326819e-01            0.078403653                          99.91643
comp 7  6.208630e-02            0.036687700                          99.95312
comp 8  4.938498e-02            0.029182305                          99.98230
comp 9  2.504308e-02            0.014798320                          99.99710
comp 10 4.908785e-03            0.002900673                         100.00000

Can you explain to me why the directly computed eigenvalues differ from those? (the eigenvectors are the same):

> eigen(cov(decathlon[1:10]))$values
 [1] 1.381775e+02 2.350895e+01 9.990945e+00 1.145146e+00 3.564647e-01
 [6] 1.359989e-01 6.363846e-02 5.061961e-02 2.566916e-02 5.031505e-03

Also, the alternative prcomp method gives the same eigenvalues as the direct computation:

> prc <- prcomp(decathlon[1:10])
> prc$sd^2
 [1] 1.381775e+02 2.350895e+01 9.990945e+00 1.145146e+00 3.564647e-01
 [6] 1.359989e-01 6.363846e-02 5.061961e-02 2.566916e-02 5.031505e-03

Why do PCA/princomp and prcomp give different eigenvalues?

Best Answer

As pointed out in the comments, it's because princomp uses $N$ for the divisor, but prcomp and the direct calculation using cov both use $N-1$ instead of $N$.

This is mentioned in both the Details section of help(princomp):

Note that the default calculation uses divisor 'N' for the covariance matrix.

and the Details section of help(prcomp):

Unlike princomp, variances are computed with the usual divisor N - 1.

You can also see this in the source. For example, the snippet of princomp source below shows that $N$ (n.obs) is used as the denominator when calculating cv.

else if (is.null(covmat)) {
    dn <- dim(z)
    if (dn[1L] < dn[2L]) 
        stop("'princomp' can only be used with more units than variables")
    covmat <- cov.wt(z)
    n.obs <- covmat$n.obs
    cv <- covmat$cov * (1 - 1/n.obs)
    cen <- covmat$center
}

You can avoid this multiplication by specifying the covmat argument instead of the x argument.

princomp(covmat = cov(iris[,1:4]))$sd^2

Update regarding PCA scores:

You can set cor = TRUE in your call to princomp in order to perform PCA on the correlation matrix (instead of the covariance matrix). This will cause princomp to $z$-score the data, but it will still use $N$ for the denominator.

As as result, princomp(scale(data))$scores and princomp(data, cor = TRUE)$scores will differ by the factor $\sqrt{(N-1)/N}$.