Solved – How to export and use results of PCA from R

pcar

My ultimate goal is to run a cluster analysis on a data set with > 1 million records. The input variables for the cluster analysis will be the results of a Principal Component Analysis, as well as other variables not included in the PCA, for a total of maybe 10 variables input into the clustering (the variables I input into the PCA were all very highly correlated with one another while the other variables are not so I chose not to include them in the PCA).

#read data
mydata <- read.csv('mydata.csv') 

#import library for robust methods because my data contained outliers
library(rrcov) 

#run robust PCA method called PcaCov
pcaR <- PcaCov(~., mydata, na.action=na.omit, center=TRUE, scale = TRUE, k=8)

#look at results
summary(pcaR)
screeplot(pcaR)
pcaR@loadings

From the results, I have decided I would like to retain the first three components, which capture ~87% of the total variance in the dataset.

Now I want to extract/save/export these first three components for use in the cluster analysis with my other variables. How do I do this?

Best Answer

For each variable obtained by PCA you have a loading vector (for example $v=(1,-2,5,5)$ this vector define your new variable as combination of the original ones. $x_1-2x_2+5x_3+5x_4$. You can define a new matrix where the variables are obtained as the linear combination defined by the loadings obtained with PCA. So for example $z_1=x_1-2x_2+5x_3+5x_4$.