R – Error When Intersecting Large Spatial Polygons Data

intersectionr

I am trying to intersect two polygons that are SpatialPolygonsDataFrame. I am finally getting output after intersecting a small subset of the data frames. But when i run this on the entire dataset which is about 100,000 polygons in each of the data frame, i get an error message "rgeos_binpredfunc_prepared: maximum returned dense matrix size exceeded"
I tried splitting up the data frames using split and a group called district. I then ran intersect on a loop but that is so super slow.

sample dataset – https://drive.google.com/open?id=1Ha47ZqwG7bCuxExJk6zOs8KoqDiqD5A4

rm(list=ls())
##LOADING DATASETS REQUIRED FOR THIS
load("sampledata.Rdata")

#booths data - dataframe - upbooths_raw
#booths data - spatial file - up_booths_sample
#village data - up_villages_sample
#loading libraries that i might use
library(rgeos)
library(raster)

#new CRS which is good for Uttar Pradesh India
newcrs <- CRS("+proj=utm +zone=42N +datum=WGS84 +units=km")

#transforms booths and villages to new crs
up_booths_transform <- spTransform(up_booths, newcrs)
up_villages_transform<- spTransform(up_villages_sample,newcrs )

#create booths with 1km buffer
up_booths_transform_buffer1km <- gBuffer(up_booths_transform, byid = TRUE, width = 1, capStyle="ROUND")


up_booths_transform_buffer1km <- gBuffer(up_booths_transform_buffer1km, byid = TRUE, width = 0)
up_villages_transform <- gBuffer(up_villages_transform, byid = TRUE, width = 0)

#above is done as per suggestion on stackexchange because of the topology error that i get 
#if i do not include this
#Error in RGEOSBinTopoFunc(spgeom1, spgeom2, byid, id, drop_lower_td, unaryUnion_if_byid_false,  : 
#TopologyException: Input geom 1 is invalid: Self-intersection at or near point 
#1463.9798370253977 3002.5185235944564 at 1463.9798370253977 3002.5185235944564


#Error in RGEOSBinPredFunc(spgeom1, spgeom2, byid, func) : 
#rgeos_binpredfunc_prepared: maximum returned dense matrix size exceeded

#this runs perfectly on the small sample data i have attached but i get above message when
#i run this on the entire dataset 
pi <- intersect(up_booths_transform_buffer1km, up_villages_transform)

My super slow approach to solving this error is painful…

#district is a higher geographic area that is meaningul and present in both dataframes, i really i would like to avoid this because district names differ in the data frames and i would have to clean the names to match before i use this strategy below.
boothlist<-split(up_booths_transform_buffer1km,up_booths_transform_buffer1km$district_name_17)
villagelist<-split(up_villages_transform,up_villages_transform$DISTRICT)

listofintersection<-list()
y<-1
#THIS IS SUPER SLOW
for (i in 1:length(boothlist))
{
  for (z in 1:length(villagelist))
  {
    if(names(boothlist)[i]==names(villagelist)[z])
    {
      temp1<-boothlist[[i]]
      temp2<-villagelist[[z]]
      #temp1 <- gBuffer(temp1, byid=TRUE, width=0)
      #temp2 <- gBuffer(temp2, byid=TRUE, width=0)
      listofintersection[[y]] <- intersect(temp1, temp2)
      names(listofintersection)[y]<-names(boothlist)[i]
      print(names(listofintersection)[y])
      y<-y+1
      print(y)

    }
  }
}

Best Answer

Memory management in R is not perfect when it comes to spatial processing. Sometimes you have to try different packages to get your results.

You could try the sf library for this task:

library(sf)
up_booths_transform_buffer1km_sf <- st_as_sf(up_booths_transform_buffer1km)
up_villages_transform_sf <- st_as_sf(up_villages_transform)
pi_sf <- st_intersection(up_booths_transform_buffer1km_sf, up_villages_transform_sf)
plot(st_geometry(pi_sf))
pi_sf

Another helpful library may be RQGIS. Installation is a bit complex, but when you get it running, it can be a great help. Computation is done in QGIS, which has much better memory management than R.

library(RQGIS)
set_env(root = "C:/OSGeo4W64",
    new = TRUE)
# find_algorithms("intersect")
# get_usage("qgis:intersection")
# get_options("qgis:intersection")
params <- get_args_man(alg = "qgis:intersection")
params$INPUT <- up_booths_transform_buffer1km
params$INPUT2 <- up_villages_transform
params$OUTPUT <- "intersection_rqgis.shp"
pi_rqgis <- run_qgis(alg = "qgis:intersection",
                     params = params,
                     load_output = TRUE)
plot(st_geometry(pi_rqgis))
pi_rqgis

Both code snippets are working with your test data on my machine.

Related Question