[GIS] Finding problematic rows – spatial join in GeoPandas

attribute-joinserrorgeopandasrtreespatial-join

I have an sjoin function from geopandas that is behaving erratically: it works on some version of the "points" geodataframe but not others.

merged=sjoin(points,polygons, how='left',op='within')

The error I get is always:

rtree.core.RTreeError: Coordinates must be in the form (minx, miny, maxx, maxy) or (x, y) for 2D indexes

The "polygons" geodataframe never changes. The size of the "points" geodataframe depends on how much data I want to include (in a parameter). Generally the join fails when I include more data (e.g. 100,000 rows), and succeeds on smaller datasets (e.g. 2,000 rows). I assume this is because some rows contain invalid data. However on visual inspection I cannot find anything wrong with any row.

Is there a way to quickly find out which rows are blocking the join, or to automatically ignore them?

I can't easily share the full code and data.

Best Answer

There are various reasons why this error can occur, here are the ones I have experienced and the solutions:

  1. Your input data sets do not have clean sequential indices (i.e. there are gaps in the sequence due to prior exclusion of rows).

I'm not sure exactly why this causes the error but it can be resolved by calling

pd.reset_index(drop=True)

on both input GeoDataFrames before applying sjoin.

  1. There are invalid geometry objects in your polygons data frame.

If your polygons were drawn by hand (i.e. manually on a GIS) they may have overlaps or self-intersections that don't translate well further in the process. Or your polygons could be empty which can happen in PostGIS with complex function sequences.

The solution is to ensure that all your polygons are of the correct type and are valid. In PostGIS you can use the functions ST_IsValid and ST_IsEmpty to check for this and remove or amend any problems. You should also check that you have Polygons or MultiPolygons not GeometryCollections.