[GIS] creating clusters of points with same attributes

clusteringfields-attributespointqgis

I have a file of some 4000 points, which belong to around 400 different categories (species of rare plants). I would like to remove duplicates within the categories (which could be repeated observations of the same population, at different dates or slightly different coordinates).

Can I perform some kind of cluster analysis but only within each species? I don't want to divide it up into 400 layers! Ideally the limit of each cluster would be around 1000m, and if each point could have the ID of a cluster added as an attribute that would be perfect. Choosing which points to discard from each cluster is difficult as the most recent year would be the best choice but some have the coordinates recorded at better resolution than others, while others contain valuable information in the free text comment attribute.

I'm relatively new to QGIS and probably not able to use Python/ methods involving code without some serious help!

This is an example of the sort of data I'm discussing (working in UK Ordnance Survey Grid Reference) as you see the first three lines are very close to each other but recorded at different times. I want to amalgamate these/ identify them as a cluster (and eventually delete the older records).

table of data I would like

Best Answer

You want to look at Hierarchical Clustering to build your clusters. This will let you specify a cluster size based on your distance of interest (say, 1000m), rather than a number of clusters or a number of points within the cluster.

(Shameless plug) I've built a QGIS Processing plugin to implement clustering from the scipy library: Scipy Point Clustering. In this plugin is a tool for Hierarchical Clustering by Identifier where you can select a column in the point dataset to guarantee that only features with the same identifier will be clustered together. In your case you could use the species for example. It will then add a label field to the dataset with a cluster ID.

The plugin is marked as experimental, so you need to go to ‘Settings’ (once you've got to the Manage and Install Plugins windows) and check the box that has an option ‘Show also experimental plugins’, if that isn't already chosen.

screenshot plugins

I'd recommend playing with the linkage method when building the clusters. Most of the time either single or complete are the most useful I find;

  • single linkage says that points will be clustered together if a point is a maximum of the tolerance for any other point in the cluster
  • complete linkage requires that all points in the cluster are within the tolerance distance of each other

There are details of the other parameters in the tool in the help, though I tend not to adjust those as much.

Related Question