Solved – How to interpret t-SNE plot

data visualizationdimensionality reductiontsne

I want to know how to interpret t-distributed stochastic neighbor embeding (t-SNE) plots. In particular: 1) What information do they convey, besides showing clusters? 2) In PCA we can see loadings and interpret components as factors that explain variability in the original features. Is there a similar way to do that in t-SNE? Or, can we only say "The data were well separated with t-SNE but we don't know why."?

Best Answer

Unlike PCA, axes in the low dimensional space don't have a particular meaning. In fact, one could arbitrarily rotate the low dimensional points and the t-SNE cost function wouldn't change. Furthermore, t-SNE doesn't construct explicit mappings relating the high dimensional and low dimensional spaces.

Rather, the relevant information is in the relative distances between low dimensional points. t-SNE captures structure in the sense that neighboring points in the input space will tend to be neighbors in the low dimensional space.

But, some care is needed because larger distances can't necessarily be interpreted. If points are separated in input space, t-SNE would like to separate them in the low dimensional space. But, it doesn't care how far they are (unlike PCA, MDS, or isomap, for example). Another issue is that t-SNE sometimes breaks continuous segments of data into pieces and artificially separates them, particularly at low perplexity settings. See here for a good example. t-SNE is framed as a visualization tool rather than a pre-processing or analysis tool, and doing things like clustering in the low dimensional space can be dangerous because of these issues. The upshot is that distorting distances sometimes lets t-SNE produce good 2/3d visualizations of data that are intrinsically higher dimensional.

One way that t-SNE visualizations can be useful is by combining them with external information. This can reveal patterns in the data that we may not have been aware of. For example, the t-SNE papers show visualizations of the MNIST dataset (images of handwritten digits). Images are clustered according to the digit they represent--which we already knew, of course. But, looking within a cluster, similar images tend to be grouped together (for example, images of the digit '1' that are slanted to the left vs. right). And, points that appear in the 'wrong' cluster are sometimes actually mislabelled in the original dataset, or ambiguously written (e.g. something between '4'and a '9').