Solved – Accessing PCA components from caret object in R

caretpcar

I know how to build a model using PCA components in caret package, however I don't know which variables explain which PCA components. I need some help on it.

When I perfom the preProcessing separately, like this:

trans<-preProcess(training_cl,method="pca",preProcOptions = list(thresh = 0.8))

I can check the PCA components of the data, like this:

trans$rotation

However, when a perfome the PCA components using the caret package:

  1. I don't know which variables explain which PCA components(don't know how to access the $rotation).
  2. I don't get the same amount of PCA variables when compared with the code above(even when I define the same threshold).

example code using the caret:

fitControl <- trainControl(method = "cv",
                       number = 3,
                       preProcOptions = list(thresh = 0.80,pcaComp = NULL))
gbmGrid <-  expand.grid(interaction.depth = seq(3,5,10),
                    n.trees =  seq(100,130,10) ,
                    shrinkage = c(0.1),
                    n.minobsinnode=10
                    )


gbmFit <- train(classe ~ .,method="gbm",
                data=training_cl,
                trControl=fitControl,
                metric="Accuracy",
                tuneGrid = gbmGrid,
                preProc="pca",
                 verbose = FALSE
                )

How can I know which variables explain which PCA components when I use the caret package?

Best Answer

First, the "rotation" part can be accessed via:

gbmFit$preProcess$rotation

The reason is because gbmFit$preProcess is a "preProcess" object just as trans is. Hence, you can access the "rotation" the same way.

As far as the discrepancy goes, I think it may be because it looks like you're passing in the whole training_cl data frame into the preProcess function and not excluding the classe column. Do you think that could be the problem?

Related Question