Dimensionality Reduction: Comparing t-SNE and MDS Techniques

data visualizationdimensionality reductionmultidimensional scalingtsne

Been reading some questions about t-SNE (t-Distributed Stochastic Neighbor Embedding) lately, and also visited some questions about MDS (Multidimensional Scaling).

They are often used analogously, so it seemed like a good idea make this question seeing there are many questions on both separately (or compared to PCA) here.


In short what makes t-SNE and MDS different? eg. What niceties of the data hierarchy they explore, different assumptions, etc.

Rate of convergence? What about use of kernels, do both comply?

Best Answer

PCA selects influential dimensions by eigenanalysis of the N data points themselves, while MDS selects influential dimensions by eigenanalysis of the $N^2$ data points of a pairwise distance matrix. This has the effect of highlighting the deviations from uniformity in the distribution. Considering the distance matrix as analogous to a stress tensor, MDS may be deemed a "force-directed" layout algorithm, the execution complexity of which is $\mathcal O(dN^a)$ where $3 < a \leq 4$.

t-SNE, on the otherhand, uses a field approximation to execute a somewhat different form of force-directed layout, typically via Barnes-Hut which reduces a $\mathcal O(dN^2)$ gradient-based complexity to $\mathcal O(dN\cdot \log(N))$, but the convergence properties are less well-understood for this iterative stochastic approximation method (to the best of my knowledge), and for $2 \leq d \leq 4$ the typical observed run-times are generally longer than other dimension-reduction methods. The results are often more visually interpretable than naive eigenanalysis, and depending on the distribution, often more intuitive than MDS results, which tend to preserve global structure at the expense of local structure retained by t-SNE.

MDS is already a simplification of kernel PCA, and should be extensible with alternate kernels, while kernel t-SNE is described in work by Gilbrecht, Hammer, Schulz, Mokbel, Lueks et al. I am not practically familiar with it, but perhaps another respondent may be.

I tend to select between MDS and t-SNE on the basis of contextual goals. Whichever elucidates the structure which I am interested in highlighting, whichever structure has the greater explanatory power, that is the algorithm I use. This can be considered a pitfall, as it is a form of researcher degree-of-freedom. But freedom used wisely is not such a bad thing.