Solved – How to visualize cluster data in a scatter way

clusteringscatterplotstata

Having a clustered dataset, I want to visualize a scatter plot for two fields so every cluster is shown on the plane by its mean value (also good to have a radius equal to std). How does one do this in Stata?

Best Answer

Here's an example with the auto data that uses two rings whose areas are proportional to standard deviations, which is not quite what your want, but is fairly easy:

sysuse auto, clear
collapse (mean) price mpg (sd) sd_price = price sd_mpg = mpg, by(rep78)

tw (scatter price mpg [w=sd_price], ms(Oh)) (scatter price mpg [w=sd_mpg], ms(Oh)) (scatter price mpg, msymbol(none) mlabpos(0) mlabel(rep78)), legend(off)

bubbles

The missing group corresponds to "."

This way of plotting the data does not seems like a good idea, as it obscures some features of the data. For instance, you get the sense that SD of price is larger than SD of MPG, but for group 1, the former is 200 times the latter, though the bubbles appear the same size.