PLS regression relies on iterative algorithms (e.g., NIPALS, SIMPLS). Your description of the main ideas is correct: we seek one (PLS1, one response variable/multiple predictors) or two (PLS2, with different modes, multiple response variables/multiple predictors) vector(s) of weights, $u$ (and $v$), say, to form linear combination(s) of the original variable(s) such that the covariance between Xu and Y (Yv, for PLS2) is maximal. Let us focus on extracting the first pair of weights associated to the first component. Formally, the criterion to optimize reads
$$\max\text{cov}(Xu, Yv).\qquad (1)$$
In your case, $Y$ is univariate, so it amounts to maximize
$$\text{cov}(Xu, y)\equiv \text{Var}(Xu)^{1/2}\times\text{cor}(Xu, y)\times\text{Var}(y)^{1/2},\quad st. \|u\|=1.$$
Since $\text{Var}(y)$ does not depend on $u$, we have to maximise $\text{Var}(Xu)^{1/2}\times\text{cor}(Xu, y)$. Let's consider X=[x_1;x_2]
, where data are individually standardized (I initially made the mistake of scaling your linear combination instead of $x_1$ and $x_2$ separately!), so that $\text{Var}(x_1)=\text{Var}(x_2)=1$; however, $\text{Var}(Xu)\neq 1$ and depends on $u$. In conclusion, maximizing the correlation between the latent component and the response variable will not yield the same results.
I should thank Arthur Tenenhaus who pointed me in the right direction.
Using unit weight vectors is not restrictive and some packages (pls. regression
in plsgenomics, based on code from Wehrens's earlier package pls.pcr
) will return unstandardized weight vectors (but with latent components still of norm 1), if requested. But most of PLS packages will return standardized $u$, including the one you used, notably those implementing the SIMPLS or NIPALS algorithm; I found a good overview of both approaches in Barry M. Wise's presentation, Properties of Partial Least Squares (PLS) Regression, and differences between Algorithms, but the chemometrics vignette offers a good discussion too (pp. 26-29). Of particular importance as well is the fact that most PLS routines (at least the one I know in R) assume that you provide unstandardized variables because centering and/or scaling is handled internally (this is particularly important when doing cross-validation, for example).
Given the constraint $u'u=1$, the vector $u$ is found to be $$u=\frac{X'y}{\|X'y\|}.$$
Using a little simulation, it can be obtained as follows:
set.seed(101)
X <- replicate(2, rnorm(100))
y <- 0.6*X[,1] + 0.7*X[,2] + rnorm(100)
X <- apply(X, 2, scale)
y <- scale(y)
# NIPALS (PLS1)
u <- crossprod(X, y)
u <- u/drop(sqrt(crossprod(u))) # X weights
t <- X%*%u
p <- crossprod(X, t)/drop(crossprod(t)) # X loadings
You can compare the above results (u=[0.5792043;0.8151824]
, in particular) with what R packages would give. E.g., using NIPALS from the chemometrics package (another implementation that I know is available in the mixOmics package), we would obtain:
library(chemometrics)
pls1_nipals(X, y, 1)$W # X weights [0.5792043;0.8151824]
pls1_nipals(X, y, 1)$P # X loadings
Similar results would be obtained with plsr
and its default kernel PLS algorithm:
> library(pls)
> as.numeric(loading.weights(plsr(y ~ X, ncomp=1)))
[1] 0.5792043 0.8151824
In all cases, we can check that $u$ is of length 1.
Provided you change your function to optimize to one that reads
f <- function(u) cov(y, X%*%(u/sqrt(crossprod(u))))
and normalize u
afterwards (u <- u/sqrt(crossprod(u))
), you should be closer to the above solution.
Sidenote: As criterion (1) is equivalent to
$$\max u'X'Yv,$$
$u$ can be found as the left singular vector from the SVD of $X'Y$ corresponding to the largest eigenvalue:
svd(crossprod(X, y))$u
In the more general case (PLS2), a way to summarize the above is to say that the first PLS canonical vectors are the best approximation of the covariance matrix of X and Y in both directions.
References
- Tenenhaus, M (1999). L'approche PLS. Revue de Statistique Appliquée, 47(2), 5-40.
- ter Braak, CJF and de Jong, S (1993). The objective function of partial least squares regression. Journal of Chemometrics, 12, 41–54.
- Abdi, H (2010). Partial least squares regression and projection on latent structure regression (PLS Regression). Wiley Interdisciplinary Reviews: Computational Statistics, 2, 97-106.
- Boulesteix, A-L and Strimmer, K (2007). Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Briefings in Bioinformatics, 8(1), 32-44.
Best Answer
No, PLS package does not maximize correlation between scores and response values in default settings. I couldn't so far find whether the package has that functionality or not although manual mentioned about it with a sentence.
And you are right. You need to deal with standardized matrices to do PLS for the correlation maximization.