PLS Regression – Relationship Between PLS Regression / Discriminant Analysis and Linear Regression

least squarespartial least squaresregression

I'm interested in the problem of feature selection, and I came across PLS-DA, which seems to be a "hack" on PLSR (one more reference). PLSR relates a matrix X to a matrix Y, and PLS-DA relates a matrix X to some labels, represented as a matrix Y.

Now, in PLSR/DA, you can decide the number of components you'd like to use. to relate X to Y.

If the number of components is 1, and Y is only a vector of 0/1 class labels, does PLS-DA just become Linear (OLS) regression? If not, how is it different?

(Note: it seems like a similar version of this question has already been asked here.)

Best Answer

PLS-DA is in fact a hack on PLSR. PLS tries to find uncorrelated projection (scores) of X that maximizes covariance between X and Y. If you are using all components (called latent variables for PLS) then it becomes OLS. By using a single component or any number of components that is less the maximum number possible, you are actually penalizing the rest of the directions which are assumed to be less useful for regression (such as irrevelant information in X about Y).

PLS with SIMPLS algorithm produces:

  • m: number of observations
  • n: number of variables
  • l: number of classes
  • a: number of components(LVs)

$\mathbf T_(mxa) = \mathbf X_(mxn) \mathbf R(nxa) $ where T is your X scores for a components, R is the weights for a components

$\mathbf {\hat Y_{(mxl)}} = \mathbf T_(mxa) \mathbf Q_(lxa)'$ This is the prediction step.

Therefore you can define the regression coefficents:

$\mathbf B_(nxl) = \mathbf R(nxa) \mathbf Q_{(lxa)}'$

So the prediction can be done with a single regression matrix B

$\mathbf {\hat Y_{(mxl)}} = \mathbf X_(mxn) \mathbf B_(nxl) $

Now if you retain all components ( by setting $ a = min(m-1,n-1) $ ) then B will be equivalent of solving OLS for $Y = XB$ but if you choose smaller number of components it is now PLSR and in that case the only thing that remains common between OLS and PLSR is the size of the B matrix.

For more details, here is the original article: De Jong, Sijmen. "SIMPLS: an alternative approach to partial least squares regression." Chemometrics and intelligent laboratory systems 18, no. 3 (1993): 251-263.