Solved – the difference between PCA and PLS-DA

biostatisticsdiscriminant analysispartial least squarespca

I read a paper Obesity changes the human gut mycobiome (2015) in which the authors used PLS-DA to look at the differences in their groups based on the microbiome of their gut. I am currently working with a data set containing similar microbiome information and I am using a PCA. What is the main difference between these two? Why would one be advantageous compared to the other?

Best Answer

Quick answer which I will expand in few days is

PLS-DA is a supervised method where you supply the information about each sample's group. PCA, on the other hand, is an unsupervised method which means that you are just projecting the data to, lets say, 2D space in a good way to observe how the samples are clustering by theirselves. PCA, after coloring of samples on the graph and if a good class seperation is achieved, may look like a somewhat supervised method though...

Which one is better depends, if you know each sample's group and want to predict a new sample's group PLS-DA is a "go to" for me, for instance.