Solved – How to visualize the true dimensionality of the data

data visualizationdimensionality reductionpca

I have a dataset that's nominally 16-dimensional. I have about 100 samples in one case and about 20,000 in another. Based on various exploratory analyses I've conducted using PCA and heat maps, I'm convinced that the true dimensionality (i.e. the number of dimensions needed to capture most of the "signal") is around 4. I want to create a slide to that effect for a presentation. The "conventional wisdom" about this data, which I'm looking to disprove, is that the true dimensionality is one or two.

What's a good, simple visualization for showing the true dimensionality of a dataset? Preferably it should be understandable to people who have some background in statistics but are not "real" statisticians.

Best Answer

A standard approach would be to do PCA and then show a scree plot, which you ought to be able to get that out of any software you might choose. A little tinkering and you could make it more interpretable for your particular audience if necessary. Sometimes they can be convincing, but often they're ambiguous and there'a always room to quibble about how to read them so a scree plot may (edit: not!) be ideal. Worth a look though.

Related Question