[GIS] Invalid geometries made valid don’t remain valid

ogrpythonself-intersection

From https://s.geo.admin.ch/77395652c7 I got the groundwater resources data for Switzerland. (deeplink to ZIP with Shapefiles, 147.3 MB).

Some areas therein are self-touching*, which makes gdalwarp unhappy when I try to use them to mask a raster:

gdalwarp cutline GK500/data/GK500_V1_3_Vector/GK500_V1_3_DE/Shapes/PY_Basis_Flaechen.shp -cwhere 'H2_ID = 6 OR H2_ID = 11' -crop_to_cutline 'MeteoSchweiz/Globalstrahlung/msg_SIS_Y_ch02_lonlat_2016_2016/msg.SIS.Y_ch02.lonlat_20160101000000_20160101000000.nc' out.tiff

results in

Warning 1: Ring Self-intersection at or near point 628843.78999999911 271660.44999999925 0
ERROR 1: Cutline polygon is invalid.

From PostGIS, I remembered the trick/hack to make such almost-valid polygons valid by "buffering" them with an infinitely small buffer (i.e. a buffer of width 0). This seems to work with OGR's python bindings, too:

from osgeo import ogr

gk500 = ogr.Open(
    'GK500/data/GK500_V1_3_Vector/GK500_V1_3_DE/Shapes/PY_Basis_Flaechen.shp')
gk_lyr = gk500.GetLayer()

for feature in gk_lyr:
    geom = feature.GetGeometryRef()
    if not geom.IsValid():
        feature.SetGeometryDirectly(geom.Buffer(0))
        assert feature.GetGeometryRef().IsValid()  # Doesn't fail

Though, when I then run

gk_lyr.ResetReading()

assert all(feature.GetGeometryRef().IsValid() for feature in gk_lyr)  # fails

that subsequent assertion fails. Why is that?

When iterating over the layer anew, are the features read from disk again?

Or is ogr.Feature.SetGeometryDirectly() not quite doing what I think it should do?

Please note that this question is not about how to solve this problem. If you want to provide your solution in your answer, fine, but I mostly want to know what is going on, so that I can form the right conceptual mental model about OGR.


*As far as I can tell, none are self-overlapping.

Best Answer

There are a couple small reasons that, in combination, are causing your assertion to fail.

Firstly, in the OGR/GDAL python bindings, any changes to files are only saved when the variable is deconstructed by either 1) the python garbage collector or 2) setting the python object equal to None to reset the variable.

If you are calling your two code blocks in sequence (in the same python code) without closing the variable (or allowing it to be closed automatically by the python garbage collector at the end of the script), the second assertion will fail because your changes to the shapefile have not been saved to the layer.

Method

I've developed my own code that I use quite frequently to clean shapefiles before I perform operations on them. It checks for both sliver polygons (by checking the area/perimeter ratio) and self-intersection polygons and generates a new shapefile (saved in memory to a python object) only containing valid geometries.

from osgeo import gdal, osr, ogr

def removeSliver(ds, EPSG):
    lyr = ds.GetLayer()
    out_ds = ogr.GetDriverByName('Memory').CreateDataSource('')
    srs = osr.SpatialReference()
    srs.ImportFromEPSG(EPSG)
    out_lyr = out_ds.CreateLayer('', srs)
    for feature in lyr:
        geom = feature.GetGeometryRef()
        perimeter = feature.GetGeometryRef().Boundary().Length()
        area = feature.GetGeometryRef().GetArea()
        sliver = float(perimeter/area)
        if sliver <1: 
            if geom.IsValid() == True:
                wkt = geom.ExportToWkt()
                new_poly = ogr.CreateGeometryFromWkt(wkt)
                new_feat = ogr.Feature(out_lyr.GetLayerDefn())
                new_feat.SetGeometry(new_poly)
                out_lyr.CreateFeature(new_feat)
                new_feat = None
            else:
                new_feat = ogr.Feature(out_lyr.GetLayerDefn())
                new_feat.SetGeometry(geom.Buffer(0))
                out_lyr.CreateFeature(new_feat)
                new_feat = None
    out_lyr = None
    return out_ds

With this function it is very easy to clean a shapefile, and then do something to that shapefile:

ds = ogr.Open('/path/to/file.shp')
ds_clean = removeSliver(ds, 4326)
gdal.Warp('out.tif', 'input.tif', cutlineDSName=ds_clean, cropToCutline=True)

An added benefit is that the cleaned shapefile is only ever stored in memory while the original shapefile is never altered which maintains the integrity of the original shapefile.

Some Additional Notes:

Self-intersecting polygons and other invalid geometries don't actually causes errors in the output of gdalwarp, but instead are simply skipped when the utility attempts to perform a clip involving the invalid geometry. Depending on your use-case, you may or may not want to fix the geometry before clipping. Buffering a polygon by 0 will fix the self-intersection error but does not always produce intended results.

Related Question