pca – Extrapolating Principal Components Factors with Other Variables

biostatisticsfactor-rotationpca

Hi StackExchange Community,
I am performing a Principal Components Analyses (PCA). I would like to know how to extrapolate some PCA components with other variables that were not considered in the PCA function.

I have a nutritional survey with 60 questions that was applied to 420 people. The frequency of consumption was measured in servings and It is standardized for each type of food. I have a clearly Components identified using the following criteria:
a. Selected components by eigen-value >1.5
b. Varimax rotation loadings >0.2 for variable .

The Results of PCA+varimax rotation:

PC1: Orange, Apple, Watermelon
PC2: Homemade fries, Mayonesa, Pizza
PC3: Eggs, Walnuts, Hazelnuts
PC4: Witefish , fatty fish small, fatty fish big

Then, I want to know if it is possible to carry out post-PCA statistical analysis with the standardized scores of the Varimax rotation of each subject in the component and cross-check that information with other confounding variables such as sex, age, education level, etc.

This table illustrates that I want to compute:
https://ijbnpa.biomedcentral.com/articles/10.1186/s12966-016-0353-2/tables/4

Other studies where similar approach was applied:

  1. https://www.mdpi.com/2072-6643/13/1/70#app1-nutrients-13-00070
  2. https://www.cambridge.org/core/journals/british-journal-of-nutrition/article/comparison-of-cluster-and-principal-component-analysis-techniques-to-derive-dietary-patterns-in-irish-adults/2130E0404EA1C0AC9CF4382839DE3498

Can I recover the position of the subjects in the components? I tried to do something using info of this link but I'm not sure if it's correct. I think that with this step I could compute an ANAVOA test or Chi-Square to confounding variables such as sex, education, diet calories etc
How to compute varimax-rotated principal components in R?

#Code for RStudio
library(factoextra)
#PCA 
prc <- prcomp(df, center=TRUE, scale=TRUE) 
prc$sdev^2 # Choose components with the eigenvalues >1.5

#Varimax and loadings 
varimax_df = varimax ( prc$rotation [, 1:4] )
varimax_df$loadings 
varimax_df$rotmat

#Scaling component to row. Standarized scores for each row
newData <- scale(df) %*% varimax_df$loadings

Thanks!

Best Answer

The method you selected from the page you cite is incorrect, or at least not standard, as the author of that answer explains below the code that you used. It applies the varimax rotation to the original eigenvectors from the PCA, which is not standard practice.

For this type of analysis, "Loadings are eigenvectors scaled by the square roots of the respective eigenvalues," as explained on that page in the answer from @amoeba, while your prc$rotation values are unscaled eigenvectors. Of the 3 correct methods shown in that answer, the one perhaps closest to your code (using the first 4 principal components) might be translated to:

rawLoadings     <- prc$rotation[,1:4] %*% diag(prc$sdev, 4, 4) # scaling
rotatedLoadings <- varimax(rawLoadings)$loadings # varimax rotation after scaling
invLoadings     <- t(pracma::pinv(rotatedLoadings)) # transpose of generalized inverse
scores          <- scale(df) %*% invLoadings

To avoid errors, you should consider using packages that have been vetted to provide correct results, like the R psych package. That's also illustrated in the answer from @ameoba.

Related Question