I am very new in partial least squares (PLS) and I try to understand the output of the R function plsr()
in the pls
package. Let us simulate data and run the PLS:
library(pls)
n <- 50
x1 <- rnorm(n); xx1 <- scale(x1)
x2 <- rnorm(n); xx2 <- scale(x2)
y <- x1 + x2 + rnorm(n,0,0.1); yy <- scale(y)
p <- plsr(yy ~ xx1+xx2, ncomp=1)
I was expecting that the following numbers $a$ and $b$
> ( w <- loading.weights(p) )
Loadings:
Comp 1
xx1 0.723
xx2 0.690
Comp 1
SS loadings 1.0
Proportion Var 0.5
> a <- w["xx1",]
> b <- w["xx2",]
> a^2+b^2
[1] 1
are calculated in order to maximize
> cor(y, a*xx1+b*xx2)
[,1]
[1,] 0.9981291
but this is not exactly the case:
> f <- function(ab){
+ a <- ab[1]; b <- ab[2]
+ cor(y, a*xx1+b*xx2)
+ }
> optim(c(0.7,0.6), f, control=list(fnscale=-1))
$par
[1] 0.7128259 0.6672870
$value
[1] 0.9981618
Is it a numerical error, or do I misunderstand the nature of $a$ and $b$ ?
I would also like to know what are these coefficients:
> p$coef
, , 1 comps
yy
xx1 0.6672848
xx2 0.6368604
EDIT: Now I see what p$coef
is:
> x <- a*xx1+b*xx2
> coef(lm(yy~0+x))
x
0.9224208
> coef(lm(yy~0+x))*a
x
0.6672848
> coef(lm(yy~0+x))*b
x
0.6368604
So I think I'm right about the nature of $a$ and $b$.
EDIT: In view of the comments given by @chl I feel my question is not clear enough, so let me provide more details. In my example there is a vector $Y$ of responses and a two-columns matrix $X$ of predictors and I use the normalized version $\tilde Y$ of $Y$ and the normalized version $\tilde X$ of $X$ (centered and divided by standard deviations). The definition of the first PLS component $t_1$ is $t_1 = a \tilde X_1 + b \tilde X_2$ with $a$ and $b$ chosen in order to have a maximal value of the inner product $\langle t_1, \tilde Y \rangle$. Hence it is equivalent to maximizing the correlation between $t_1$ and $Y$, isn't it ?
Best Answer
PLS regression relies on iterative algorithms (e.g., NIPALS, SIMPLS). Your description of the main ideas is correct: we seek one (PLS1, one response variable/multiple predictors) or two (PLS2, with different modes, multiple response variables/multiple predictors) vector(s) of weights, $u$ (and $v$), say, to form linear combination(s) of the original variable(s) such that the covariance between Xu and Y (Yv, for PLS2) is maximal. Let us focus on extracting the first pair of weights associated to the first component. Formally, the criterion to optimize reads $$\max\text{cov}(Xu, Yv).\qquad (1)$$ In your case, $Y$ is univariate, so it amounts to maximize $$\text{cov}(Xu, y)\equiv \text{Var}(Xu)^{1/2}\times\text{cor}(Xu, y)\times\text{Var}(y)^{1/2},\quad st. \|u\|=1.$$ Since $\text{Var}(y)$ does not depend on $u$, we have to maximise $\text{Var}(Xu)^{1/2}\times\text{cor}(Xu, y)$. Let's consider
X=[x_1;x_2]
, where data are individually standardized (I initially made the mistake of scaling your linear combination instead of $x_1$ and $x_2$ separately!), so that $\text{Var}(x_1)=\text{Var}(x_2)=1$; however, $\text{Var}(Xu)\neq 1$ and depends on $u$. In conclusion, maximizing the correlation between the latent component and the response variable will not yield the same results.I should thank Arthur Tenenhaus who pointed me in the right direction.
Using unit weight vectors is not restrictive and some packages (
pls. regression
in plsgenomics, based on code from Wehrens's earlier packagepls.pcr
) will return unstandardized weight vectors (but with latent components still of norm 1), if requested. But most of PLS packages will return standardized $u$, including the one you used, notably those implementing the SIMPLS or NIPALS algorithm; I found a good overview of both approaches in Barry M. Wise's presentation, Properties of Partial Least Squares (PLS) Regression, and differences between Algorithms, but the chemometrics vignette offers a good discussion too (pp. 26-29). Of particular importance as well is the fact that most PLS routines (at least the one I know in R) assume that you provide unstandardized variables because centering and/or scaling is handled internally (this is particularly important when doing cross-validation, for example).Given the constraint $u'u=1$, the vector $u$ is found to be $$u=\frac{X'y}{\|X'y\|}.$$
Using a little simulation, it can be obtained as follows:
You can compare the above results (
u=[0.5792043;0.8151824]
, in particular) with what R packages would give. E.g., using NIPALS from the chemometrics package (another implementation that I know is available in the mixOmics package), we would obtain:Similar results would be obtained with
plsr
and its default kernel PLS algorithm:In all cases, we can check that $u$ is of length 1.
Provided you change your function to optimize to one that reads
and normalize
u
afterwards (u <- u/sqrt(crossprod(u))
), you should be closer to the above solution.Sidenote: As criterion (1) is equivalent to $$\max u'X'Yv,$$ $u$ can be found as the left singular vector from the SVD of $X'Y$ corresponding to the largest eigenvalue:
In the more general case (PLS2), a way to summarize the above is to say that the first PLS canonical vectors are the best approximation of the covariance matrix of X and Y in both directions.
References