I know how to build a model using PCA components in caret package, however I don't know which variables explain which PCA components. I need some help on it.
When I perfom the preProcessing separately, like this:
trans<-preProcess(training_cl,method="pca",preProcOptions = list(thresh = 0.8))
I can check the PCA components of the data, like this:
trans$rotation
However, when a perfome the PCA components using the caret package:
- I don't know which variables explain which PCA components(don't know how to access the $rotation).
- I don't get the same amount of PCA variables when compared with the code above(even when I define the same threshold).
example code using the caret:
fitControl <- trainControl(method = "cv",
number = 3,
preProcOptions = list(thresh = 0.80,pcaComp = NULL))
gbmGrid <- expand.grid(interaction.depth = seq(3,5,10),
n.trees = seq(100,130,10) ,
shrinkage = c(0.1),
n.minobsinnode=10
)
gbmFit <- train(classe ~ .,method="gbm",
data=training_cl,
trControl=fitControl,
metric="Accuracy",
tuneGrid = gbmGrid,
preProc="pca",
verbose = FALSE
)
How can I know which variables explain which PCA components when I use the caret package?
Best Answer
First, the "rotation" part can be accessed via:
The reason is because
gbmFit$preProcess
is a "preProcess" object just astrans
is. Hence, you can access the "rotation" the same way.As far as the discrepancy goes, I think it may be because it looks like you're passing in the whole training_cl data frame into the preProcess function and not excluding the classe column. Do you think that could be the problem?