Solved – Visualizing all high-dimensional categorical data

data visualization

I'm running many combinations of hyperparameters, such that each combination of transformation (e.g. preprocessing, classifier) is combined with all others in a Cartesian product, along with a fit metric (accuracy). One possible solution would be a table where each column is a classifier, each row is a preprocessor, however in this case, I'm working with greater than two hyperparameters.

What's a good, interactive way to display this data to a user (e.g. a person interested in exploring the data)?

Best Answer

Displaying in a 3-dimensional plot, that you can move around, zoom and so on is not a terrible option.

You'll need to project the high-dimensional data onto 3 dimensions of course. Two methods to do this are t-SNE, and PCA.

PCA projects onto axes which can maximize the amount of variance of the resulting plot, and minimize the residual maintenance that was lost during the projection. This is a fairly straightforward projection to understand intuitively. The downside is that you'll lose correlations that need some more manifold-like projection to show.

t-SNE is sort of the opposite: it projects onto a potentially very convoluted, complex manifold, that doesnt need to have any kind of global coherence in any way. It can represent local structure fairly well, and handle high-dimensional manifolds, but it loses any sense of the actual global structure.

As an example of t-SNE, if you have two interlocking rings, t-SNE can show them as two flat, separated, non-interlocking rings. This page https://distill.pub/2016/misread-tsne/ shows some very interesting examples:

enter image description here

At a practical level, an implementation of both a t-SNE projector and viewer, and a PCA projector and viewer is in the Tensorflow Tensorboard. https://www.tensorflow.org/programmers_guide/embedding#visualizing_embeddings