Solved – Looking for 2D artificial data to demonstrate properties of clustering algorithms

clusteringdata visualizationdatasetdistributions

I am looking for datasets of 2 dimensional datapoints (each datapoint is a vector of two values (x,y)) following different distributions and forms. Code to generate such data would also be helpful. I want to use them to plot / visualise how some clustering algorithms perform. Here are some examples:

Best Answer

R comes with a lot of datasets, and it looks like it would not be a big deal to reproduce most of the examples you cited with few lines of code. You may also find the mlbench package useful, in particular synthetic datasets starting with mlbench.*. Some illustrations are given below.

enter image description here

You will find additional examples by looking at the Cluster Task View on CRAN. For example, the fpc package has a built-in generator for "face-shaped" clustered benchmark datasets (rFace).

enter image description here

Similar considerations apply to Python, where you will find interesting benchmark tests and datasets for clustering with the scikit-learn.

The UCI Machine Learning Repository hosts a lot of datasets as well, but you're better off simulating data yourself with the language of your choice.

Related Question