Solved – What’s the difference between logistic regression and PLS-DA

logisticpartial least squaresregression

I heard just recently about PLS-DA and I was wondering how it differs from multinomial logistic regression, since logistic regression can be also used for categorical dependent variables.

Best Answer

PLS-DA is closely related to LDA: for n > p the full rank PLS-DA (i.e. using all latent variables) is the same as LDA. For 1 latent variable, PLS-DA yields the same classification as closest (Euclidean) distance in feature space. I.e. the regularization "squeezes" the pooled covariance matrix into spherical shape.

A two class problem with both classes following a (multivariate) Gaussian distribution with the same covariance matrix (i.e. the situation where LDA is optimal), both LR and LDA yield the same solution.
LR will need more samples to get to the same stability, though.

In other words, there is a somewhat indirect relationship.

There are important differences between PLS-DA and LR in how they weight cases:

  • PLS-DA (like LDA) takes all cases into account, regardless how far they are from the class boundary.
    If you (ab)use PLS for dummy regression as it is frequently done in PLS-DA (i.e. y takes class labels encoded as 0 and 1 or equivalent encodings), PLS-DA will try to "squeeze" the within class distributions to points (as required in regression).
  • LR will care mostly about cases that are close to the class boundary, classes far from the class boundary have low weight in LR.

So if you want to use PLS for classification, make sure that it is appropriate to have all cases weighting in. Two situations where this is the case are

  • the classes form nice clusters (i.e. LDA would be appropriate but you need more regularization - and for some reason don't want to do PLS-LDA)
  • the classification problem is really a regression in disguise. Example: the classes codify whether a certain metric property exceeds a threshold or not. You can then set up a proper PLS regression (with metric labels) and employ the threshold. This model will be able to benefit from cases that are at some distance to the class boundary (though you'll have to weigh this benefit against the need to get a very good prediction close to the class boundary).
  • For the latter situation, LR is also appropriate (but not LDA, nor dummy-coded PLS-DA), but it won't be able to benefit from cases further from the threshold like PLS.