Python – Using Shapely Methods (explain_validity and make_valid) on Shapefile

geodataframegeopandaspolygonpythonshapely

I'm trying to find and repair invalidity in my polygons. I have already found out that 10 polygons have self-intersection problems in QGIS, trying the same thing with shapely.

This is my code:

import geopandas as gpd
from shapely.validation import explain_validity

data = gpd.read_file('land.shp')
explain_validity(data)

Return an error:

AttributeError                            Traceback (most recent call last)
Input In [89], in <module>
      3 data=gpd.read_file('land.shp')
      4 data
----> 5 explain_validity(data)

File ~\.conda\envs\geopandas_env\lib\site-packages\shapely\validation.py:26, in explain_validity(ob)
      8 def explain_validity(ob):
      9     """
     10     Explain the validity of the input geometry, if it is invalid.
     11     This will describe why the geometry is invalid, and might
   (...)
     24 
     25     """
---> 26     return lgeos.GEOSisValidReason(ob._geom)

File ~\.conda\envs\geopandas_env\lib\site-packages\pandas\core\generic.py:5583, in NDFrame.__getattr__(self, name)
   5576 if (
   5577     name not in self._internal_names_set
   5578     and name not in self._metadata
   5579     and name not in self._accessors
   5580     and self._info_axis._can_hold_identifiers_and_holds_name(name)
   5581 ):
   5582     return self[name]
-> 5583 return object.__getattribute__(self, name)

AttributeError: 'GeoDataFrame' object has no attribute '_geom'

The name of geometry column is 'geometry'. What is wrong?
Is it possible to get results like this:

   id       geometry                             explain_validity
0   1  MULTILINESTRING ((98573.148 61104...  Ring Self-intersection

Best Answer

Two above method (explain_validity and make_valid) are applicable not to the shapefile itself, but to the objects (geometries of features) that it contains.

To apply the explain_validity() method:

Returns a string explaining the validity or invalidity of the object.

The messages may or may not have a representation of a problem point that can be parsed out.

one can use the following code:

import geopandas as gpd
from shapely.validation import explain_validity

absolute_path_to_file = 'P:/Test/qgis_test/lines_test.shp'

shp = gpd.read_file(absolute_path_to_file)

shp['validity'] = shp.apply(lambda row: explain_validity(row.geometry), axis=1)

print(shp)

that will give this:

   id                                           geometry        validity
0   1  MULTILINESTRING ((98573.148 6110418.666, 40758...  Valid Geometry

Making use of the make_valid() method

Returns a valid representation of the geometry, if it is invalid. If it is valid, the input geometry will be returned.

In many cases, in order to create a valid geometry, the input geometry must be split into multiple parts or multiple geometries. If the geometry must be split into multiple parts of the same geometry type, then a multi-part geometry (e.g. a MultiPolygon) will be returned. if the geometry must be split into multiple parts of different types, then a GeometryCollection will be returned.

is useful together with the is_valid method:

The validity test is meaningful only for Polygons and MultiPolygons. True is always returned for other types of geometries.

import geopandas as gpd
from shapely.validation import make_valid

absolute_path_to_file = 'P:/Test/qgis_test/lines_test.shp'

shp = gpd.read_file(absolute_path_to_file)

shp.geometry = shp.apply(lambda row: make_valid(row.geometry) if not row.geometry.is_valid else row.geometry, axis=1)

print(shp)

That will return this:

   id                                           geometry
0   1  MULTILINESTRING ((98573.148 6110418.666, 40758...

To include the validity check for other geometry types, one can combine both method (explain_validity and make_valid) in one line:

import geopandas as gpd
from shapely.validation import make_valid, explain_validity

absolute_path_to_file = 'P:/Test/qgis_test/lines_test.shp'

shp = gpd.read_file(absolute_path_to_file)

shp.geometry = shp.apply(lambda row: make_valid(row.geometry) if not explain_validity(row) == 'Valid Geometry' else row.geometry, axis=1)

print(shp)

It returns the following:

   id                                           geometry
0   1  MULTILINESTRING ((98573.148 6110418.666, 40758...

References:

Related Question