Solved – Plotting Clusters over a ggplot graph in R

ggplot2r

I am working with GPS data for density based clustering in R.

Let's suppose, I have produced a path out of the following dataset. Now, where the density of plot is high enough (as shown in graph) over any particular area, it should produce a cluster.

Suppose this is my ggplot produced from a dataset as:

   Lat          Long
92.14894444 50.01011111
92.14894444 50.01011111
92.14825    50.01491667
92.15875    50.01502778
92.15708333 49.98458333
92.16005556 49.98566667
92.16266667 49.99105556
92.16119444 50.00330556
92.16475    50.01558333
....

**I don't have to predefine numbers of clusters. So i think it's good to use density based clustering algorithm in this. If you have any other solution to produce similar results, you are welcome.

enter image description here

Now, I want to produce these black circles over my ggplot.

I tried using density based clustering but it's not producing very good results.

Now, when I look at the clusters using density based, they are not meaningful. Some clusters have points which are too far. I want dense clusters but not that big in size (lets suppose within a range of 1 km radius). The output I want to produce is shown in ggplot.

This is what I have produced till now..

library(ggplot2)
sp <- ggplot(df, aes(x=Lat, y=Long )) +geom_point()
sp + geom_density2d()

enter image description here

Best Answer

Maybe firstly a few words on terminology: You talk about density based clustering, which are methods that try to identify clusters within the data that have a given point density. This is only one class of available clustering algorithms. Due to the arguments you made I supposed you were talking about one special density based clustering algorithm, namely DBSCAN.

The ggplot geometry density2d you invoked in your sample call is something entirely different: A 2-dimensional kernel density estimate that fits a smooth function to your data that is supposed to model the density of their distribution function. The circles drawn now are contour lines of this density function.

I still believe that DBSCAN might be the algorithm for you to use. Within R it is easy to employ DBSCAN to your dataset using the dbscan function from the package fpc:

library(fpc)
ds <- dbscan(yourdata, eps=0.01, MinPts=5)

For the parameters eps and MinPts I recommend reading the linked article on Wikipedia.

Now, plotting the result of the found clusters can be done with

plot(ds)

This is not the display with points and circles you gave as sample but gives a color coding for the clusters. Might this be enough for your? Otherwise, the transformation from clusters to enclosing circles might easily be the most difficult part of your problem and might - depending on the data - even lead to wrong results, as it is not guaranteed that a cluster is sufficiently similar to a circle.