[GIS] Avoiding changes to field types in GeoPandas

fields-attributesfionageopandaspython

I am using GeoPandas in the following routine to create the union of 2 (polygon) shapefiles:

import geopandas as gpd

def union(fn_A, fn_B, output):
    shpA = gpd.read_file(fn_A)    
    shpB = gpd.read_file(fn_B)
    union = gpd.overlay(shpA, shpB, how='union')
    union.to_file(driver='ESRI Shapefile', filename=output) 

Shapefile 'A' has an integer attribute "a" and shapefile 'B' has an integer attribute "b". The resulting shapefile has attributes "a" and "b" but of them are doubles. I have read that GeoPandas infers a schema when saving to files, but do not know how to even get the schema of the inputs in the first place using GeoPandas. It appears that this only happens if the result of the union contains empty fields (i.e. areas where 'A' and 'B' don't overlap). If 'A' is completely covering 'B', field a remains integer.

How can I avoid this conversion?

Best Answer

It is only since 0.24 that Pandas can handle integer columns with Nan and it requires that the columns are cast as a new type Int64 instead of the usual int64. See https://pandas.pydata.org/pandas-docs/version/0.24/whatsnew/v0.24.0.html#optional-integer-na-support. If you convert all your integer columns to Int64 before the overlay it might work.

df['intcol'] = df['intcol'].astype('Int64')
Related Question