Different sources indicate that a PLS regression takes into account the variability of the dependent variables (while PCR doesn't). Why is this aspect so important and why it is considered to be an advantage over PCR?
You may have confounders that contribute large variance to $\mathbf X$, but as they are confounders that variance does not help but rather hinder your prediction. Moreover such variance may be uncorrelated to the predictions, i.e. it isn't even anything that helps to correct predictions for some other effects.
PCA will have confounders and important influencing factors alike sorted accoring to the $\mathbf X$-variance they contribute. PLS downweights confounders as that variance is not correlated with variance in $\mathbf Y$.
Also, are there any other concrete advantages of conducting a PLS regression instead of a PCR?
Can't think of any right now - but if needed, downweighting confounding variance is a pretty solid advantage.
PLSR extracts more components than the PCA
This would make me suspicious of overfitting. Sure, this may happen. But typically, PLS needs fewer latent variables/components than PCR.
Am I right to say that this [more components] would be a disadvantage of PLSR over PCR?
This disadvantage is usually formulated as PLSR being more prone to overfitting.
How did you determine the number of latent variables/components? You need to make sure the models aren't overfit for both PCA and PLSR, but for PLS it is even more crucial than for PCA.
Lastly, I remarked that when I perform a PLSR and I work with a small number of predictors, the optimal number of extracted components is approximately equal to the no. of predictors. How this may be explained?
Your system may be of full rank, or your procedure to determine the number of latent variables may be overfitting. Without further details (in particular also on the appliation and data) it is impossible to say which.
Best Answer
The reason why 1-component PLS looses to 1-component PLS based on first 10-20 PCA components is that, in the former case, the correlation structure between the original variables is partially lost (since the 1-component PLS direction is chosen only based on the correlations between the original input variables and the outcome variable). In the latter case though all PCA variables are not correlated to each other which makes 1-component PLS equivalent to least squares regression (OLS) based on the PCA variables (i.e. no information is lost apart from loosing the rest of the PCA directions which are not very informative anyway)
In any case, from predictive power perspective, it does not really make sense to limit the number of components in PLS (or PCR) to one - instead the number of components in PLS/PCR should be treated as a "meta" parameter which should be tuned using resampling (e.g. using package
caret
which provides a really nice harness for that)