Solved – Should data be centered+scaled before applying t-SNE

dimensionality reductionhigh-dimensionalnormalizationtsne

Some of my data's features have large values, while other features have much smaller values.

Is it necessary to center+scale data before applying t-SNE to prevent bias towards the larger values?

I use Python's sklearn.manifold.TSNE implementation with the default euclidean distance metric.

Best Answer

Centering shouldn't matter since the algorithm only operates on distances between points, however rescaling is necessary if you want the different dimensions to be treated with equal importance, since the 2-norm will be more heavily influenced by dimensions with large variance.

Related Question