[GIS] Avoiding changes to field types in GeoPandas

fields-attributesfionageopandaspython

I am using GeoPandas in the following routine to create the union of 2 (polygon) shapefiles:

import geopandas as gpd

def union(fn_A, fn_B, output):
    shpA = gpd.read_file(fn_A)    
    shpB = gpd.read_file(fn_B)
    union = gpd.overlay(shpA, shpB, how='union')
    union.to_file(driver='ESRI Shapefile', filename=output)

Shapefile 'A' has an integer attribute "a" and shapefile 'B' has an integer attribute "b". The resulting shapefile has attributes "a" and "b" but of them are doubles. I have read that GeoPandas infers a schema when saving to files, but do not know how to even get the schema of the inputs in the first place using GeoPandas. It appears that this only happens if the result of the union contains empty fields (i.e. areas where 'A' and 'B' don't overlap). If 'A' is completely covering 'B', field a remains integer.

How can I avoid this conversion?

Best Answer

It is only since 0.24 that Pandas can handle integer columns with Nan and it requires that the columns are cast as a new type Int64 instead of the usual int64. See https://pandas.pydata.org/pandas-docs/version/0.24/whatsnew/v0.24.0.html#optional-integer-na-support. If you convert all your integer columns to Int64 before the overlay it might work.

df['intcol'] = df['intcol'].astype('Int64')

Related Solutions

[GIS] Fiona + Shapely: Loading a set of LineStrings and writing their centroids to a shapefile, including original properties

Then I put every record in the rows of a pandas.DataFrame

Why ?

If you only want to copy the original attributes (LineString) to the new shapefile (Points), after computing the centroid, you don't need Pandas:

import fiona
from shapely.geometry import shape, mapping 
with fiona.open("polyline.shp") as input:
    # change only the geometry of the schema: LineString -> Point
    input.schema['geometry'] = "Point"
    # write the Point shapefile
    with fiona.open('centroid.shp', 'w', 'ESRI Shapefile', input.schema.copy(), input.crs) as output:
       for elem in input:
           # GeoJSON to shapely geometry
           geom = shape(elem['geometry'])
           # shapely centroid to GeoJSON
           elem['geometry'] = mapping(geom.centroid)
           output.write(elem)

If you absolutely want to use Pandas, use GeoPandas which "mix" Pandas, Fiona and shapely.

import geopandas as gp
input = gp.read_file('polyline.shp')
print type(input)
<class 'geopandas.geodataframe.GeoDataFrame'> -> a GeoDataFrame
print input['geometry']
0  LINESTRING (266351.05107 161433.039507, 266362...  
....
# only change the geometry of the dataframe
input['geometry'] = input['geometry'].centroid
print input['geometry']
0    POINT (266369.1881962401 161457.6017265563)
....
# save resulting shapefile
input.to_file("centroids.shp")

GeoPandas – Resolving Saving GeoDataFrame Without Coordinate System

I have come across this behavior before.

You need to explicitly pass the well known text (crs_wkt) string to the to_file() method. The string will then get passed to fiona.open(), which writes out the .prj file.

Using your sample code, doing something like this:

ws = r"D:\temp_se"
prj_file = gpd.datasets.get_path('naturalearth_lowres').replace(".shp",".prj")
prj = [l.strip() for l in open(prj_file,'r')][0]
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
temp_shp = os.path.join(ws,"world_out.shp")
world.to_file(filename=temp_shp,driver='ESRI Shapefile',crs_wkt=prj)

should produce:

The read_file() and to_file() functions simply serve as wrapper functions. They call fiona.open(), whose signature is shown below:

You need to explicitly pass a crs_wkt value when reading/writing files with geopandas.

Best Answer

Related Solutions

[GIS] Fiona + Shapely: Loading a set of LineStrings and writing their centroids to a shapefile, including original properties

GeoPandas – Resolving Saving GeoDataFrame Without Coordinate System

Related Question