Solved – What are shortcomings of PCA as a dimensionality reduction technique compared to t-SNE

dimensionality reductionpcatsne

I have been reading and using tSNE which is able to preserve the neighbour around the a point in high dimension compared to PCA. For example I have these embeddings in 128 dimensional created by a neural network representing faces. Using both PCA, and tSNE I project them down into 2 dimensions.

enter image description here

It is evident that tSNE is doing a better job and in the paper for tSNE it states that:

For high-dimensional data that lies on or near a low-dimensional, non-linear manifold it is usually more important to keep the low-dimensional representations of very similar datapoints close together, which is typically not possible with a linear mapping.

Why is it typically not possible to keep very similar data points close together when using a linear mapping to map high-dimensional data to a low-dimension? I understand PCA so if possible would someone be able to provide a concrete example of why this is true?

Best Answer

It all depends on how you understand "similarity" and what the goal of your transformation into the low-dimensional representation is.

PCA does not attempt to group "similar" points, whatever this "similarity" may be. PCA is a method of constructing a particular linear transformation which results in new coordinates of the samples with very well defined properties (such as orthogonality between the different components). The fact that "similar" points group together is, one could say, a byproduct. Or, rather, the fact that very often, "similar" points (say, samples from the same experimental group) cluster in the first components is due to

  • groups being major contributors to the overall variance (and first components catch the major part of the total variance)
  • differences between groups being very often linear in respect to one variable or another

t-SNE is an algorithm designed with a different goal in mind -- the ability to group "similar" data points even in a context of lack of linearity. The similarity is defined in a very particular way (consult the Wikipedia for details). This definition of similarity is not exactly a common one, and stresses local similarity and local density.

However, while t-SNE is very good at tackling the particular goal of clustering close samples, it has a major disadvantage compared to PCA: it gives you a low-dimensional representation of your data, but it does not give you a transformation. In other words, you cannot

  • interpret the dimensions in a similar way you interpret the loadings in a PCA
  • apply the transformation to a new data set

It might be, therefore, useful to explore multidimensional data, but it is not very much so in more general tasks such as machine learning and interpretation of the ML models.

Related Question