Solved – Partial least squares regression in R: why is PLS on standardized data not equivalent to maximizing correlation

partial least squaresrregression

I am very new in partial least squares (PLS) and I try to understand the output of the R function plsr() in the pls package. Let us simulate data and run the PLS:

library(pls)
n <- 50
x1 <- rnorm(n); xx1 <- scale(x1) 
x2 <- rnorm(n); xx2 <- scale(x2)
y <- x1 + x2 + rnorm(n,0,0.1); yy <- scale(y)
p <- plsr(yy ~ xx1+xx2, ncomp=1)

I was expecting that the following numbers $a$ and $b$

> ( w <- loading.weights(p) )

Loadings:
    Comp 1
xx1 0.723 
xx2 0.690 

               Comp 1
SS loadings       1.0
Proportion Var    0.5
> a <- w["xx1",]
> b <- w["xx2",]
> a^2+b^2
[1] 1

are calculated in order to maximize

> cor(y, a*xx1+b*xx2)
          [,1]
[1,] 0.9981291

but this is not exactly the case:

> f <- function(ab){
+ a <- ab[1]; b <- ab[2]
+ cor(y, a*xx1+b*xx2)
+ }
> optim(c(0.7,0.6), f, control=list(fnscale=-1))
$par
[1] 0.7128259 0.6672870

$value
[1] 0.9981618

Is it a numerical error, or do I misunderstand the nature of $a$ and $b$ ?

I would also like to know what are these coefficients:

> p$coef
, , 1 comps

           yy
xx1 0.6672848
xx2 0.6368604 

EDIT: Now I see what p$coef is:

> x <- a*xx1+b*xx2
> coef(lm(yy~0+x))
        x 
0.9224208 
> coef(lm(yy~0+x))*a
        x 
0.6672848 
> coef(lm(yy~0+x))*b
        x 
0.6368604 

So I think I'm right about the nature of $a$ and $b$.

EDIT: In view of the comments given by @chl I feel my question is not clear enough, so let me provide more details. In my example there is a vector $Y$ of responses and a two-columns matrix $X$ of predictors and I use the normalized version $\tilde Y$ of $Y$ and the normalized version $\tilde X$ of $X$ (centered and divided by standard deviations). The definition of the first PLS component $t_1$ is $t_1 = a \tilde X_1 + b \tilde X_2$ with $a$ and $b$ chosen in order to have a maximal value of the inner product $\langle t_1, \tilde Y \rangle$. Hence it is equivalent to maximizing the correlation between $t_1$ and $Y$, isn't it ?

Best Answer

PLS regression relies on iterative algorithms (e.g., NIPALS, SIMPLS). Your description of the main ideas is correct: we seek one (PLS1, one response variable/multiple predictors) or two (PLS2, with different modes, multiple response variables/multiple predictors) vector(s) of weights, $u$ (and $v$), say, to form linear combination(s) of the original variable(s) such that the covariance between Xu and Y (Yv, for PLS2) is maximal. Let us focus on extracting the first pair of weights associated to the first component. Formally, the criterion to optimize reads $$\max\text{cov}(Xu, Yv).\qquad (1)$$ In your case, $Y$ is univariate, so it amounts to maximize $$\text{cov}(Xu, y)\equiv \text{Var}(Xu)^{1/2}\times\text{cor}(Xu, y)\times\text{Var}(y)^{1/2},\quad st. \|u\|=1.$$ Since $\text{Var}(y)$ does not depend on $u$, we have to maximise $\text{Var}(Xu)^{1/2}\times\text{cor}(Xu, y)$. Let's consider X=[x_1;x_2], where data are individually standardized (I initially made the mistake of scaling your linear combination instead of $x_1$ and $x_2$ separately!), so that $\text{Var}(x_1)=\text{Var}(x_2)=1$; however, $\text{Var}(Xu)\neq 1$ and depends on $u$. In conclusion, maximizing the correlation between the latent component and the response variable will not yield the same results.

I should thank Arthur Tenenhaus who pointed me in the right direction.

Using unit weight vectors is not restrictive and some packages (pls. regression in plsgenomics, based on code from Wehrens's earlier package pls.pcr) will return unstandardized weight vectors (but with latent components still of norm 1), if requested. But most of PLS packages will return standardized $u$, including the one you used, notably those implementing the SIMPLS or NIPALS algorithm; I found a good overview of both approaches in Barry M. Wise's presentation, Properties of Partial Least Squares (PLS) Regression, and differences between Algorithms, but the chemometrics vignette offers a good discussion too (pp. 26-29). Of particular importance as well is the fact that most PLS routines (at least the one I know in R) assume that you provide unstandardized variables because centering and/or scaling is handled internally (this is particularly important when doing cross-validation, for example).

Given the constraint $u'u=1$, the vector $u$ is found to be $$u=\frac{X'y}{\|X'y\|}.$$

Using a little simulation, it can be obtained as follows:

set.seed(101)
X <- replicate(2, rnorm(100))
y <- 0.6*X[,1] + 0.7*X[,2] + rnorm(100)
X <- apply(X, 2, scale)
y <- scale(y)

# NIPALS (PLS1)
u <- crossprod(X, y)
u <- u/drop(sqrt(crossprod(u)))         # X weights
t  <- X%*%u
p <- crossprod(X, t)/drop(crossprod(t)) # X loadings

You can compare the above results (u=[0.5792043;0.8151824], in particular) with what R packages would give. E.g., using NIPALS from the chemometrics package (another implementation that I know is available in the mixOmics package), we would obtain:

library(chemometrics)
pls1_nipals(X, y, 1)$W  # X weights [0.5792043;0.8151824]
pls1_nipals(X, y, 1)$P  # X loadings

Similar results would be obtained with plsr and its default kernel PLS algorithm:

> library(pls)
> as.numeric(loading.weights(plsr(y ~ X, ncomp=1)))
[1] 0.5792043 0.8151824

In all cases, we can check that $u$ is of length 1.

Provided you change your function to optimize to one that reads

f <- function(u) cov(y, X%*%(u/sqrt(crossprod(u))))

and normalize u afterwards (u <- u/sqrt(crossprod(u))), you should be closer to the above solution.

Sidenote: As criterion (1) is equivalent to $$\max u'X'Yv,$$ $u$ can be found as the left singular vector from the SVD of $X'Y$ corresponding to the largest eigenvalue:

svd(crossprod(X, y))$u

In the more general case (PLS2), a way to summarize the above is to say that the first PLS canonical vectors are the best approximation of the covariance matrix of X and Y in both directions.

References

  1. Tenenhaus, M (1999). L'approche PLS. Revue de Statistique Appliquée, 47(2), 5-40.
  2. ter Braak, CJF and de Jong, S (1993). The objective function of partial least squares regression. Journal of Chemometrics, 12, 41–54.
  3. Abdi, H (2010). Partial least squares regression and projection on latent structure regression (PLS Regression). Wiley Interdisciplinary Reviews: Computational Statistics, 2, 97-106.
  4. Boulesteix, A-L and Strimmer, K (2007). Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Briefings in Bioinformatics, 8(1), 32-44.