My question is more theoretical, but I'll walk you through how I got there.
I fit a PLS regression model on the training set (n=22, 8 variables) and performed 10-fold and LOO CV (no external test set):
library(pls)
train <- plsr(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8, data=mydata, scale=T, validation="none")
tenfold <- plsr(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8, data=mydata, scale=T, validation="CV")
loo <- plsr(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8, data=mydata, scale=T, validation="LOO")
Doing explvar()
of the above models gives the %explained variance of each component e.g.
Comp 1 Comp 2 Comp 3 Comp 4 Comp 5 Comp 6 Comp 7 Comp 8
training 42 12 12 15 7.7 1.7 9.3 0.43
tenfold 42 12 12 15 7.7 1.7 9.3 0.43
loo 42 12 12 15 7.7 1.7 9.3 0.43
Does it make sense that %expl var is identical (even without rounding, I checked) for training, tenfold, and loo? Or is it because my dataset is so small that 10-fold and LOO are almost the same (test set = 2 and 1 for each fold resp.) and so this is expected…? But then, why the similarity with the training set?
Best Answer
You can not get a single explained variance for each component for LOO and k-fold cross-validation since one has to create a PLS model from scratch for each time a sample is left out in LOO or for each time n/k samples are left out in k-fold cross-validation. Thus, for example, if you have 22 samples and carry out LOO CV, there will 22 different PLS models and 22 different explained variance for each component.
So, It is very likely that the results you are seeing are for PLS models obtained by using all the data(complete training set). Declaring a CV parameter may only add another object, which shows CV errors, to resulting model's structure. In other words, regardless of which type of CV you choose or even enabling CV in first place does not affect the model you obtain; instead you get additional information which helps choosing number of components, for example.