[GIS] CSV to Geodataframe : How to have valid geometry objects

csvgeodataframegeometrygeopandas

I'm writing a script with Geopandas. I try to use a csv of blocks to make a spatial join. So I convert it as a Geodataframe. But when I want to set geometry column it returns me Input geometry column must contain valid geometry objects.

Here is my code to import csv file :

csv_df = pandas.read_csv(csv_file)
csv_gdf = gpd.GeoDataFrame(csv_df)
csv_gdf = csv_gdf.set_geometry('geometry')

Here is csv_gdf.head() before I try to set geometry column :

    id      name shortName  accountId  isMonitored  varietalId  
ranchId  \
0  14633.0    HC4bas      HC4b      346.0        False         4.0    
855.0   
1  14634.0   HC3haut      HC3h      346.0        False         4.0    
855.0   
2  14637.0      HC12      HC12      346.0        False         2.0    
855.0   
3  14638.0  HC11haut      HC11      346.0        False        72.0    
855.0   
4  14641.0    HC9bas      HC9b      346.0        False         4.0    
855.0   

inRowDistance  betweenRowDistance  \
0            1.2                 1.5   
1            1.2                 1.5   
2            1.2                 1.5   
3            0.9                 1.5   
4            1.2                 1.5   

                                        geometry       ...         \
0  POLYGON ((-0.1642995066034836 44.9397295596186...       ...          
1  POLYGON ((-0.1634854066129132 44.9405302549332...       ...          
2  POLYGON ((-0.1624824342183362 44.9398350833047...       ...          
3  POLYGON ((-0.1592356652491378 44.9399712591478...       ...          
4  POLYGON ((-0.1610166332996532 44.9391145465108...       ...          

slopeInclination  slopeOrientation  soilType  rowOrientation  
dashboardId  \
0               NaN               NaN       NaN             NaN          
NaN   
1               NaN               NaN       NaN             NaN          
NaN   
2               NaN               NaN       NaN             NaN          
NaN   
3               NaN               NaN       NaN             NaN          
NaN   
4               NaN               NaN       NaN             NaN          
NaN   

canopySystemId  canopyWidth  topWireHeight  clusterWireHeight  \
0             NaN         0.45            NaN                NaN   
1             NaN         0.45            NaN                NaN   
2             NaN         0.45            NaN                NaN   
3             NaN         0.45            NaN                NaN   
4             NaN         0.45            NaN                NaN   

pruningSystemId  
0              NaN  
1              NaN  
2              NaN  
3              NaN  
4              NaN  

[5 rows x 27 columns]

Best Answer

You probably have invalid geometries in your dataset, to find the invalid geometries you can either load your csv to qgis and run Vector -> Geometry Tools -> Check validity

or loop through your dataframe to find the invalid geometries:

for index, row in csv_gdf.iterrows():
    geom = row['geometry']
    if len(geom.coords) <= 2:
          print "This row has an invalid polygon geometry"
          # this is just one example of invalid geometries, there are also overlapping vertices, ...

I would recommend you the first check even if qgis is not tagged in your question

EDIT: generating the geometry as a shapely.geometry object

from shapely.wkt import loads

# either all at once :
csv_gdf['geometry'] = csv_gdf['geometry'].apply(loads))

# or one by one to detect possible geometry errors
for index, row in csv_gdf.iterrows():
    # it will throw an error where the geometry WKT isn't valid
    # csv_gdf.set_value(index, 'geometry', loads(row['geometry'])) --> deprecated
    csv_gdf.loc[index, 'geometry'] = loads(row['geometry'])