I'm working with two spatial dataframes and am trying to do a spatial join on the two.
parcels_sdf — a frame with around ~370,000 real estate parcels.
subdivisions_sdf — a frame with all approved subdivisions in my county.
Both frames appear to have non null values in their SHAPE columns which I believe means they have valid geometries (how to check for this?). What I want to end up with is to associate all parcels with subdivisions when their geometry lies within the subdivision geometry.
parcels_sdf.info()
RangeIndex: 376926 entries, 0 to 376925
Data columns (total 60 columns):
OBJECTID 376926 non-null int64
PIN_NUM 376893 non-null object
CALC_AREA 376926 non-null float64
REID 376691 non-null object
MAP_NAME 376691 non-null object
OWNER 376691 non-null object
ADDR1 376691 non-null object
ADDR2 376684 non-null object
ADDR3 16232 non-null object
.
.
.
LAND_CODE 376092 non-null object
SHAPE 376926 non-null geometry
dtypes: datetime64ns, float64(21), geometry(1), int64(1), object(35) memory usage: 172.5+ MB
subdivisions_sdf.info()
RangeIndex: 5503 entries, 0 to 5502
Data columns (total 18 columns):
OBJECTID 5503 non-null int64
ACCESS_RD 5351 non-null object
NAME 5503 non-null object
APPROVDATE 5190 non-null datetime64[ns]
ACRES 5503 non-null float64
.
.
LAST_EDITED_DATE 5503 non-null datetime64[ns]
SHAPE 5503 non-null geometry
dtypes: datetime64ns, float64(6), geometry(1), int64(1), object(7)
memory usage: 774.0+ KB
When I try to do the join:
joined_sdf = parcels_sdf.spatial.join(subdivisions_sdf,
how='inner', op='within', left_tag='parcel', right_tag='subdivision')
I get the following error:
————————————————————————– TypeError Traceback (most recent call
last) in
—-> 1 joined_sdf = parcels_sdf.spatial.join(subdivisions_sdf, how='inner', op='within', left_tag='parcel', right_tag='subdivision')~\Anaconda3\lib\site-packages\arcgis\features\geo_accessor.py in
join(self, right_df, how, op, left_tag, right_tag) 1089
left_df, right_df = right_df, left_df 1090
-> 1091 tree_idx = right_df.spatial.sindex("quadtree") 1092 1093 idxmatch = (left_df[self.name]~\Anaconda3\lib\site-packages\arcgis\features\geo_accessor.py in
sindex(self, stype, reset, **kwargs) 2095
self._sindex.insert(oid=idx, bbox=gext) 2096
else:
-> 2097 self._sindex.insert(oid=idx, bbox=g.geoextent) 2098 if c >= int(l/4) + 1:
2099 self._sindex.flush()~\Anaconda3\lib\site-packages\arcgis\features\geo_index_impl.py in
insert(self, oid, bbox)
108 return r
109 elif self._stype.lower() == 'quadtree':
–> 110 return self._index.insert(item=oid, bbox=bbox)
111 elif self._stype.lower() == 'custom':
112 r = self._index.intersect(oid, bbox)~\Anaconda3\lib\site-packages\arcgis\features\geo_index\quadtree.py
in insert(self, item, bbox)
237 – bbox: The spatial bounding box tuple of the item, with four members (xmin,ymin,xmax,ymax)
238 """
–> 239 self._insert(item, bbox)
240
241 def intersect(self, bbox):~\Anaconda3\lib\site-packages\arcgis\features\geo_index\quadtree.py
in _insert(self, item, bbox)
85
86 def _insert(self, item, bbox):
—> 87 rect = _normalize_rect(bbox)
88 if len(self.children) == 0:
89 node = _QuadNode(item, rect)~\Anaconda3\lib\site-packages\arcgis\features\geo_index\quadtree.py
in _normalize_rect(rect)
40
41 def _normalize_rect(rect):
—> 42 x1, y1, x2, y2 = rect
43 if x1 > x2:
44 x1, x2 = x2, x1TypeError: cannot unpack non-iterable NoneType object
Based on the error which appears to be in the quadtree module, I think this means there is bad data in my right frame (subdivisions_sdf). But, I can't seem to find a way to troubleshoot. See anything I'm doing wrong?
Best Answer
I got back a very helpful response from my County (Wake, NC) GIS help desk. Brandon guided me to the solution: "As to your other point, the multi-part polygon issue could be solved by dissolving the shapes (Dissolve in ArcGIS or Dissolve in GeoPandas). Dissolve will combine disparate polygons together based on an attribute field – in this case, the PIN_NUM field. That will create a dataset that loses the rest of the parcel attributes, but they can be easily joined back to the new dataset."
I was able to determine that I had some shapes with multiple polygons (I think this is referred to as the bowtie problem). I ended up converting the ArcGIS SDF to a standard Geopandas dataframe and then did the dissolve and spatial join.
This worked for me:
I still got invalid geometries but the spatial join runs