I am trying to write a point in polygon spatial join to file by county. In my code I read in a GeoJSON object of building polygons for a whole state, grab the centroid and split the centroid points by the states counties before writing them to file. The code works and does what I ask it to for the first state but then breaks when the state finishes and moves on to the next in a loop. Is there another way to do this? Should I set my geometry explicitly after each county loop? I have checked to make sure every geometry in the GeoPandas geometry column was a point but I am still receiving this error message: "ValueError:Geometry column cannot contain mutiple geometry types when writing to file."
import geopandas as gpd
import geopandas as gpd
#from fiona.crs import from_epsg
import pandas as pd
import time
import os
footprint_dir = 'D:\Data Repository\GIS Data\Building_Footprint\\'
county_dir = 'D:\Data Repository\GIS Data\Census_Shapefiles\County\\'
states = ['AZ', 'CA', 'CO', 'ID', 'NM', 'OR', 'TX', 'UT', 'WA']
state_names = ['Arizona', 'California', 'Colorado', 'Idaho', 'NewMexico',
'Oregon', 'Texas', 'Utah', 'Washington']
states = ['ID']
state_names = ["Idaho"]
counter = 0
t0 = time.time()
for state in state_names:
directory = footprint_dir + states[counter] + '\\'
if not os.path.exists(directory):
os.makedirs(directory)
footprint_fn = footprint_dir + state + '.geojson'
footprint_gdf = gpd.read_file(footprint_fn)
county_fn = county_dir + states[counter] + '_Counties\\' +
states[counter] + '_Counties.shp'
shape = gpd.read_file(county_fn)
shape.crs = {'init' :'epsg:4326'}
gdf = gpd.GeoDataFrame(footprint_gdf, geometry = 'geometry', crs =
{'init' :'epsg:4326'})
gdf_centroids = gdf.centroid
df = pd.DataFrame({'id': range(len(gdf_centroids)),
'geometry': gdf_centroids})
points = gpd.GeoDataFrame(df, geometry='geometry', crs = {'init' :
'EPSG:4326'})
pointInPolys = gpd.sjoin(points, shape, how='left', op='within')
pointInPolys = pointInPolys.dropna(thresh=2)
pointInPolys.crs = {'init' :'epsg:4326'}
pointInPolys.set_geometry(col = 'geometry', inplace=True)
#pointInPolys.geom_type = 'Point'
for county in pointInPolys.GEOID.unique():
countyfile = pointInPolys[pointInPolys.GEOID == county]
output_fn = directory + states[counter] + "_" + str(county)
countyfile.crs = {'init' :'epsg:4326'}
countyfile.to_file(output_fn)
print(output_fn)
counter += 1
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-85-6606253fd504> in <module>()
39 output_fn = directory + states[counter] + "_" + str(county)
40 countyfile.crs = {'init' :'epsg:4326'}
---> 41 countyfile.to_file(output_fn)
42 print(output_fn)
43 counter += 1
D:\Anaconda\envs\raster_processing\lib\site-packages\geopandas\geodataframe.py in to_file(self, filename, driver, schema, **kwargs)
363 """
364 from geopandas.io.file import to_file
--> 365 to_file(self, filename, driver, schema, **kwargs)
366
367 def to_crs(self, crs=None, epsg=None, inplace=False):
D:\Anaconda\envs\raster_processing\lib\site-packages\geopandas\io\file.py in to_file(df, filename, driver, schema, **kwargs)
58 """
59 if schema is None:
---> 60 schema = infer_schema(df)
61 filename = os.path.abspath(os.path.expanduser(filename))
62 with fiona.drivers():
D:\Anaconda\envs\raster_processing\lib\site-packages\geopandas\io\file.py in infer_schema(df)
88 geom_type = _common_geom_type(df)
89 if not geom_type:
---> 90 raise ValueError("Geometry column cannot contain mutiple "
91 "geometry types when writing to file.")
92
ValueError: Geometry column cannot contain mutiple geometry types when writing to file.
Best Answer
a lot going on here, but to start you can make points and reproject in a simpler way...instead of initializing new GDFs to get to points:
you can check that the CRS is what you want like so:
now you have
footprint_gdf
==points
that you are using in thesjoin
, without all the extra intermediate DFs.Without the files, my guess is that you're picking up NULL geometries because the filter that you've set isn't stringent enough here:
I'm not sure how many columns end up in the
pointInPolys
DF, but ifgeometry
is the only column that is NULL it will be kept through the above statement, which probably isn't what you want. I'd look there first. you can usesubset
keyword to just drop NULL values in the geometry column like so:Also, make sure that when you are projecting a DF you use the
to_crs
method and not an assignment like you do in your code. This changes the CRS without adjusting all of the geometries in it!