[GIS] Aggregating points to grid using R

aggregationpointrraster

I have a question with regard to spatial aggregation in R.
What I am trying to do is aggregate a point dataset to a grid.
I am unsure however how to do this as I have little experience with this sort of stuff.
I was hoping anyone of you might have some useful guidance/a possible solution.

My vantage point is a dataset containing georeferenced data on conflict events in Africa (see www.acleddata.com).
The points are georeferenced with latitude/longitude coordinates and contain data on event type and time.
What I want to do is aggregate these points to a 1×1 degree grid.

Thus a grid-cell should contain the information of the data points if a event happened to occur within that grid-cell. The eventual product of this should be a data frame or something that I can export to a csv-file as the data is intended to be used in a panel data-set for statistical analysis.

So far I loaded and plotted the data and the shapefile using the code below. I believe that I should use the over function from the sp package to
aggregate but I do not know how. Hope one of you can help.

The code I used so far can be found here with the corresponding visual result over there.

Suggestion for doing this in QGIS are welcome as well.

Best Answer

The data as downloaded contain some frank locational errors, so the first thing to do is limit the coordinates to reasonable values:

data.df <- read.csv("f:/temp/All_Africa_1997-2011.csv", header=TRUE, sep=",",row.names=NULL)
data.df <- subset(data.df, subset=(LONGITUDE >= -180 & LATITUDE >= -90))

Computing grid cell coordinates and identifiers is merely a matter of truncating the decimals from the latitude and longitude values. (More generally, for arbitrary rasters, first center and scale them to unit cellsize, truncate the decimals, and then rescale and recenter back to their original position, as shown in the code for ji below.) We can combine these coordinates into unique identifiers, attaching them to the input dataframe, and write the augmented dataframe out as a CSV file. There will be one record per point:

ji <- function(xy, origin=c(0,0), cellsize=c(1,1)) {
  t(apply(xy, 1, function(z) cellsize/2+origin+cellsize*(floor((z - origin)/cellsize))))
}
JI <- ji(cbind(data.df$LONGITUDE, data.df$LATITUDE))
data.df$X <- JI[, 1]
data.df$Y <- JI[, 2]
data.df$Cell <- paste(data.df$X, data.df$Y)

You might instead want output that summarizes events within each grid cell. To illustrate this, let's compute the counts per cell and output those, one record per cell:

counts <- by(data.df, data.df$Cell, function(d) c(d$X[1], d$Y[1], nrow(d)))
counts.m <- matrix(unlist(counts), nrow=3)
rownames(counts.m) <- c("X", "Y", "Count")
write.csv(as.data.frame(t(counts.m)), "f:/temp/grid.csv")

For other summaries, change the function argument in the computation of counts. (Alternatively, use spreadsheet or database software to summarize the first output file by cell identifier.)

As a check, let's map the counts using the grid centers to locate the map symbols. (The points located in the Mediterranean Sea, Europe, and the Atlantic Ocean have suspect locations: I suspect many of them result from mixing up latitude and longitude in the data entry process.)

count.max <- max(counts.m["Count",])
colors = sapply(counts.m["Count",], function(n) hsv(sqrt(n/count.max), .7, .7, .5))
plot(counts.m["X",] + 1/2, counts.m["Y",] + 1/2, cex=sqrt(counts.m["Count",]/100),
     pch = 19, col=colors,
     xlab="Longitude of cell center", ylab="Latitude of cell center",
     main="Event counts within one-degree grid cells")

Africa map

This workflow is now

  • Thoroughly documented (by means of the R code itself),

  • Reproducible (by rerunning this code),

  • Extensible (by modifying the code in obvious ways), and

  • Reasonably fast (the whole operation takes less than 10 seconds to process these 53052 observations).

Related Question