I am trying to intersect two polygons that are SpatialPolygonsDataFrame. I am finally getting output after intersecting a small subset of the data frames. But when i run this on the entire dataset which is about 100,000 polygons in each of the data frame, i get an error message "rgeos_binpredfunc_prepared: maximum returned dense matrix size exceeded"
I tried splitting up the data frames using split and a group called district. I then ran intersect on a loop but that is so super slow.
sample dataset – https://drive.google.com/open?id=1Ha47ZqwG7bCuxExJk6zOs8KoqDiqD5A4
rm(list=ls())
##LOADING DATASETS REQUIRED FOR THIS
load("sampledata.Rdata")
#booths data - dataframe - upbooths_raw
#booths data - spatial file - up_booths_sample
#village data - up_villages_sample
#loading libraries that i might use
library(rgeos)
library(raster)
#new CRS which is good for Uttar Pradesh India
newcrs <- CRS("+proj=utm +zone=42N +datum=WGS84 +units=km")
#transforms booths and villages to new crs
up_booths_transform <- spTransform(up_booths, newcrs)
up_villages_transform<- spTransform(up_villages_sample,newcrs )
#create booths with 1km buffer
up_booths_transform_buffer1km <- gBuffer(up_booths_transform, byid = TRUE, width = 1, capStyle="ROUND")
up_booths_transform_buffer1km <- gBuffer(up_booths_transform_buffer1km, byid = TRUE, width = 0)
up_villages_transform <- gBuffer(up_villages_transform, byid = TRUE, width = 0)
#above is done as per suggestion on stackexchange because of the topology error that i get
#if i do not include this
#Error in RGEOSBinTopoFunc(spgeom1, spgeom2, byid, id, drop_lower_td, unaryUnion_if_byid_false, :
#TopologyException: Input geom 1 is invalid: Self-intersection at or near point
#1463.9798370253977 3002.5185235944564 at 1463.9798370253977 3002.5185235944564
#Error in RGEOSBinPredFunc(spgeom1, spgeom2, byid, func) :
#rgeos_binpredfunc_prepared: maximum returned dense matrix size exceeded
#this runs perfectly on the small sample data i have attached but i get above message when
#i run this on the entire dataset
pi <- intersect(up_booths_transform_buffer1km, up_villages_transform)
My super slow approach to solving this error is painful…
#district is a higher geographic area that is meaningul and present in both dataframes, i really i would like to avoid this because district names differ in the data frames and i would have to clean the names to match before i use this strategy below.
boothlist<-split(up_booths_transform_buffer1km,up_booths_transform_buffer1km$district_name_17)
villagelist<-split(up_villages_transform,up_villages_transform$DISTRICT)
listofintersection<-list()
y<-1
#THIS IS SUPER SLOW
for (i in 1:length(boothlist))
{
for (z in 1:length(villagelist))
{
if(names(boothlist)[i]==names(villagelist)[z])
{
temp1<-boothlist[[i]]
temp2<-villagelist[[z]]
#temp1 <- gBuffer(temp1, byid=TRUE, width=0)
#temp2 <- gBuffer(temp2, byid=TRUE, width=0)
listofintersection[[y]] <- intersect(temp1, temp2)
names(listofintersection)[y]<-names(boothlist)[i]
print(names(listofintersection)[y])
y<-y+1
print(y)
}
}
}
Best Answer
Memory management in R is not perfect when it comes to spatial processing. Sometimes you have to try different packages to get your results.
You could try the
sf
library for this task:Another helpful library may be
RQGIS
. Installation is a bit complex, but when you get it running, it can be a great help. Computation is done in QGIS, which has much better memory management than R.Both code snippets are working with your test data on my machine.