Solved – Visualizing high dimensional binary data

binary datadata visualization

What is a good way to visualize high dimensional (say n=10) binary data? I remember reading something about that a few years ago.

Say for instance, you want to plot / cluster pizzas based on their topping, e.g. ham, chicken, mushrooms etc.

Best Answer

Even if this is binary, you can do a scaled Principal Component Analysis (PCA). By projecting the results on the 2D plane of the first Principal Components you get an idea of the clustering of your data.

In R:

# data is your data.frame/matrix of data
pca <- prcomp(data, scale.=TRUE)
# Screeplot to see how much variance is in the 2D plane
plot(pca)
# Projections
plot(data %*% pca$rotation[,1:2])