Solved – Why do all the PLS components together explain only a part of the variance of the original data

covariance-matrixpartial least squarespcaregression

I have a dataset consisting of 10 variables. I ran partial least squares (PLS) to predict a single response variable by these 10 variables, extracted 10 PLS components, and then computed the variance of each component. On the original data I took the sum of the variances of all variables which is 702.

Then I divided the variance of each of the PLS components by this sum to get the percentage of the variance explained by the PLS, and surprisingly all components together just explain 44% of the original variance.

What is the explanation of that? Shouldn't it be 100%?

Best Answer

The sum of variances of all PLS components is normally less than 100%.

There are many variants of partial least squares (PLS). What you used here, is PLS regression of a univariate response variable $\mathbf y$ onto several variables $\mathbf X$; this algorithm is traditionally known as PLS1 (as opposed to other variants, see Rosipal & Kramer, 2006, Overview and Recent Advances in Partial Least Squares for a concise overview). PLS1 was later shown to be equivalent to a more elegant formulation called SIMPLS (see reference to the paywalled Jong 1988 in Rosipal & Kramer). The view provided by SIMPLS helps to understand what is going on in PLS1.

It turns out that what PLS1 does, is to find a sequence of linear projections $\mathbf t_i = \mathbf X \mathbf w_i$, such that:

  1. Covariance between $\mathbf y$ and $\mathbf t_i$ is maximal;
  2. All weight vectors have unit length, $\|\mathbf w_i\|=1$;
  3. Any two PLS components (aka score vectors) $\mathbf t_i$ and $\mathbf t_j$ are uncorrelated.

Note that weight vectors do not have to be (and are not) orthogonal.

This means that if $\mathbf X$ consists of $k=10$ variables and you found $10$ PLS components, then you found a non-orthogonal basis with uncorrelated projections on the basis vectors. One can mathematically prove that in such a situation the sum of variances of all these projections will be less then the total variance of $\mathbf X$. They would be equal if the weight vectors were orthogonal (like e.g. in PCA), but in PLS this is not the case.

I don't know of any textbook or paper that explicitly discusses this issue, but I have earlier explained it in the context of linear discriminant analysis (LDA) that also yields a number of uncorrelated projections on non-orthogonal unit weight vectors, see here: Proportion of explained variance in PCA and LDA.

Related Question