[GIS] Color region polygons according to cluster label

cartographyclusteringggplot2polygonr

I want to present data on a map of our country.

I have used R for showing the clusters on a country! Here the steps I took:

setwd('D:/r/cluster2')
channel <- odbcConnectExcel('cluster.xls')
data <- sqlFetch(channel, 'clust9')
y9 <- data.frame(inf=data$infest, faible=data$faible, moyen=data$moyen, fort=data$fort, lon=data$Lon, lat=data$Lat)

library(fossil)

d = earth.dist(y9)  
km <- kmeans(d,centers=5)
hc <- hclust(d)    
clust <- cutree(hc, k=5 )
set.seed(123)
plot(hc)

y9$clust <- cutree(hc,k=5)
map.AL  <- readOGR(dsn="D:/r/cluster/shp", layer="ALG_boundaries")
map.df  <- fortify(map.AL)

ggplot(map.df)+ geom_path(aes(x=long, y=lat, group=group))+
  geom_point(data=y9, aes(x=long, y=lat, color=factor(clust)), size=4)+
  scale_color_discrete("Cluster") + coord_fixed()

enter image description here

On the map, it's showing just points for the regions that have the same characteristics. However, what I want to show is all the region colored not just a point.

Best Answer

You can create a new column into the spatial data "data.frame" and then, assign to each province its respective cluster label. After that, use in ggplot the layers geom_polygon with scale_fill_manual to address each cluster to a specific color.

Here is one example:

#Import shapefiles
require(rgdal)
#Read shaple file with Algeria's province boundaries (download on: http://www.diva-gis.org/gdata)
alg_provinces = readOGR(dsn="C:...\\DZA_adm", layer="DZA_adm1")

require(ggplot2)
#Get rownames from "SpatialPolygonsDataFrame" object, slot "data"
alg_provinces@data$id = rownames(alg_provinces@data)
#Transform object of class "SpatialPolygonsDataFrame" in "data.frame"
alg.points = fortify(alg_provinces,region="id")

require(plyr)
#Join geometries (.shp file) information with information derived from the attribute table (.dbf file), in object of class "data.frame"
alg.df = join(alg.points, alg_provinces@data, by="id")

#This is the part where you will have to adapt your code. 
#Here I assigned manually each cluster to its specific province.
#I hope you enjoy the fact I tried to provide my example to match yours.
cluster = data.frame(id=alg_provinces@data$id,
                     alg_provinces@data$NAME_1,
                     cluster=c("0","1","2","1","3","2","0","1","0","3",
                               "3","3","1","1","3","5","1","0","3","0",
                               "5","0","3","1","2","5","1","5","1","3",
                               "2","1","0","3","5","1","2","1","3","4",
                               "0","4","5","0","3","1","0","4"))

#Merge cluster labels with the spatial data
alg = merge(alg.df,cluster,by="id")

#Plot with ggplot using the layers "geom_polygon", "geom_path" and assign the specific cluster color with "Scale_fill_manual" 
ggplot(alg) + 
    geom_polygon(aes(x=long,y=lat,group=NAME_1,fill=cluster)) +
    scale_fill_manual(values = c("0"="white","1" = "red","2"="yellow","3"="green","4"="blue","5"="purple")) +
    geom_path(aes(x=long,y=lat,group=NAME_1)) +
    coord_equal() + 
    theme_bw() + xlab("Longitude") + ylab("Latitude")

enter image description here

Related Question