[GIS] GeoPandas not recognizing geometry type

geopandaspythonshapefilevalueerror

I am trying to write a point in polygon spatial join to file by county. In my code I read in a GeoJSON object of building polygons for a whole state, grab the centroid and split the centroid points by the states counties before writing them to file. The code works and does what I ask it to for the first state but then breaks when the state finishes and moves on to the next in a loop. Is there another way to do this? Should I set my geometry explicitly after each county loop? I have checked to make sure every geometry in the GeoPandas geometry column was a point but I am still receiving this error message: "ValueError:Geometry column cannot contain mutiple geometry types when writing to file."

import geopandas as gpd
import geopandas as gpd
#from fiona.crs import from_epsg
import pandas as pd
import time
import os

footprint_dir = 'D:\Data Repository\GIS Data\Building_Footprint\\'
county_dir = 'D:\Data Repository\GIS Data\Census_Shapefiles\County\\'

states = ['AZ', 'CA', 'CO', 'ID', 'NM', 'OR', 'TX', 'UT', 'WA']
state_names = ['Arizona', 'California', 'Colorado', 'Idaho', 'NewMexico', 
'Oregon', 'Texas', 'Utah', 'Washington']

states = ['ID']
state_names = ["Idaho"]
counter = 0
t0 = time.time()
for state in state_names:
    directory = footprint_dir + states[counter] + '\\'
    if not os.path.exists(directory):
        os.makedirs(directory)

    footprint_fn = footprint_dir + state + '.geojson'
    footprint_gdf = gpd.read_file(footprint_fn)
    county_fn = county_dir + states[counter] + '_Counties\\' + 
    states[counter] + '_Counties.shp'
    shape = gpd.read_file(county_fn)
    shape.crs = {'init' :'epsg:4326'}
    gdf = gpd.GeoDataFrame(footprint_gdf, geometry = 'geometry', crs = 
        {'init' :'epsg:4326'})
    gdf_centroids = gdf.centroid
    df = pd.DataFrame({'id': range(len(gdf_centroids)),
               'geometry': gdf_centroids})
    points = gpd.GeoDataFrame(df, geometry='geometry', crs = {'init' : 
        'EPSG:4326'})
    pointInPolys = gpd.sjoin(points, shape, how='left', op='within')
    pointInPolys = pointInPolys.dropna(thresh=2)
    pointInPolys.crs = {'init' :'epsg:4326'}
    pointInPolys.set_geometry(col = 'geometry', inplace=True)
    #pointInPolys.geom_type = 'Point'
    for county in pointInPolys.GEOID.unique():
        countyfile = pointInPolys[pointInPolys.GEOID == county]
        output_fn = directory + states[counter] + "_" + str(county)
        countyfile.crs = {'init' :'epsg:4326'}
        countyfile.to_file(output_fn)
        print(output_fn)
    counter += 1
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-85-6606253fd504> in <module>()
     39         output_fn = directory + states[counter] + "_" + str(county)
     40         countyfile.crs = {'init' :'epsg:4326'}
---> 41         countyfile.to_file(output_fn)
     42         print(output_fn)
     43     counter += 1

D:\Anaconda\envs\raster_processing\lib\site-packages\geopandas\geodataframe.py in to_file(self, filename, driver, schema, **kwargs)
    363         """
    364         from geopandas.io.file import to_file
--> 365         to_file(self, filename, driver, schema, **kwargs)
    366 
    367     def to_crs(self, crs=None, epsg=None, inplace=False):

D:\Anaconda\envs\raster_processing\lib\site-packages\geopandas\io\file.py in to_file(df, filename, driver, schema, **kwargs)
     58     """
     59     if schema is None:
---> 60         schema = infer_schema(df)
     61     filename = os.path.abspath(os.path.expanduser(filename))
     62     with fiona.drivers():

D:\Anaconda\envs\raster_processing\lib\site-packages\geopandas\io\file.py in infer_schema(df)
     88     geom_type = _common_geom_type(df)
     89     if not geom_type:
---> 90         raise ValueError("Geometry column cannot contain mutiple "
     91                          "geometry types when writing to file.")
     92 

ValueError: Geometry column cannot contain mutiple geometry types when writing to file.

Best Answer

a lot going on here, but to start you can make points and reproject in a simpler way...instead of initializing new GDFs to get to points:

footprint_gdf = gpd.read_file(footprint_fn) 
footprint_gdf.geometry = footprint_gdf.centroid 

you can check that the CRS is what you want like so:

if footprint_gdf.crs != {'init' :'epsg:4326'}: # this will re-project only if necessary
    footprint_gdf.to_crs({'init' :'epsg:4326'}, inplace=True)

now you have footprint_gdf == points that you are using in the sjoin, without all the extra intermediate DFs.

Without the files, my guess is that you're picking up NULL geometries because the filter that you've set isn't stringent enough here:

pointInPolys = pointInPolys.dropna(thresh=2)

I'm not sure how many columns end up in the pointInPolys DF, but if geometry is the only column that is NULL it will be kept through the above statement, which probably isn't what you want. I'd look there first. you can use subset keyword to just drop NULL values in the geometry column like so:

pointInPolys = pointInPolys.dropna(subset=['geometry']) 

Also, make sure that when you are projecting a DF you use the to_crs method and not an assignment like you do in your code. This changes the CRS without adjusting all of the geometries in it!

Related Question