[GIS] Error message when joining two dataframes with sf -Error: y should be a data.frame; for spatial joins, use st_join

attribute-joinsrsfspatial-join

In R, I'm trying to count the number of points which fall within a buffer around polygons and add the count result to the original shapefile from which the polygon file was created. I've followed the answer provided by @Guzmán from this question: Counting number of points in polygon using R?. Everything works fine until it comes to adding the result to the original shapefile, which I want to do based on the attribute "ID", i.e. not a spatial join. I get the error message "Error: y should be a data.frame; for spatial joins, use st_join" even though y is a dataframe (at least it is according to is.data.frame).

library(sf)
library(dplyr)


OrigPolys <- st_read("OrigPolys.shp") #Load polygons
Points <- st_read("Points.shp") #Load points
NewPolys <- subset(OrigPolys, X1 != 'NA') #Subset required polygons
Buffer <- st_buffer(NewPolys, 5000) #Create buffer NewPolys
inter <- st_intersection(Buffer, Points) #Find points in NewPolys
int_count <- inter %>% 
  group_by(ID) %>% 
  count() #Count number of Points in Buffer
as.data.frame(int_count) #Convert to df (not sure if this is required as is.data.frame = T whether included or not)
OrigPolys_Pts <- left_join(OrigPolys, int_count, by = "ID") #Join count to OrigPolys based on "ID" attribute```

Best Answer

Better Try int_count=st_drop_geometry(int_count) before performing the join. Geometries are sticky in sf, meaning that they would be in the object unless you explicitly erase them.

Related Solutions

[GIS] Counting unique occurrences during Spatial Join

The problem in your current method, and the reason summarizing afterward as @Branco suggests would not work, is that your spatial join operation creates the first attribute you want (total points per poly) while it destroys/eliminates the second variable (owner) you want to summarize. In order to summarize, you need whatever variables you want in the same dataset. Right now your points have owners and names, and your polygons get a count. You'd need your points to have a polygon name and then you could get owners by name by polygon.

Your data format also introduces a problem because name contains multiple values in a single field and summarizing on that will treat each unique field value as what it counts. In other words, woods;house and house;woods are two different things. So is house and ;house; for that matter. To avoid this, you'll have to use a selection as an input to summarize and not include that field as a case.

Start by modifying and reversing your current spatial join. Instead of points being join features they will be target. Polygons will be the join features. The output of that join will be points with an attribute that is [polygon ID] they fall in.

Now we add some steps to the process. Your spatial join output will become the input for a Summary Statistics tool. But in order to solve the multi-name issue mentioned above, first you'll need to put in/repeat a selection (possibly make feature layer) step to once again grab all points with the desired name string (note now you're working in a new dataset - the spatial join output, not your original point file).

Now you plug that selection/feature layer into a Summary Statistics tool. In there you will add [polygon ID] and [owner] as case fields (note you must add them in that order). You can add any valid statistic field/type you want - we don't need the results of that. The table that is output should then have a list of every unique [owner] and [polygon id] combination along with the [frequency] (or number of times) it occurs. Note the sum total of that frequency column should be the total number of points - so Polygon A has Owner Q frequency three (one row in table), Owner P frequency one (second row in table), and Owner R frequency six (third row in table), and 3+1+6=10 total points in Polygon A.

But you want to collapse that down to one record per polygon, so that output table will now become the input for a second Summary Statistics tool (no selection needed). This time [polygon ID] will be the case field and you'll have two statistics fields - [owner] with type count and [frequency] with type sum. The resulting table should have [polygon ID], [count owner], [sum frequency] and [frequency] (which should equal [count owner]).

That table now gives you the statistics you want for a single name. If you want them as attributes of the polygons, you can join that second Summary Statistics table to the polygons based on [polygon ID] and export the result or use a Join Field tool to append those attributes directly to the original polygon file.

You'll then repeat the entire process for the next [name] string selection, just as in the current step 4 you have. At the end, you'll merge all your polygon shapefiles to a single file.

You could build that all into the model with an iterator and submodel, collect values, and perhaps a dictionary because of that multi-value single-attribute condition of [name]. Otherwise you may want to consider cleaning up that point data so that each point only has a single name value (and those with more than one become stacked points). This could allow direct use of Summary Statistics without any selections, but a selection would still be needed for your aggregate to polygons tool.

Combining st_join and st_nn for Points Within Polygon – R Guide

If I understood correctly, you find the containing polygon of each point, or else the nearest polygon (up to 500m) if the point is not contained inside any polygon.

If so, the following expression, where the order of x and y is reversed, should work -

st_join(points, polygons, join = st_nn, k = 1, maxdist = 500)

The function will look for the nearest polygon from each point. The containing polygon, if any, is always considered to be nearest since its distance from the point is zero. If no containing polygon is found, the function will look for the nearest polygon, up to a maximal distance of 500m.

Best Answer

Related Solutions

[GIS] Counting unique occurrences during Spatial Join

Combining st_join and st_nn for Points Within Polygon – R Guide

Related Question