[GIS] Polygon with the same points over and over

cshapefile

I am in the process of reading shapefiles (provided to us by the vendor dealing with GIS products) using a C# program and loading them into a SQL database. I first read the text for the shape and then use that text to update my geometry and geography types. And I am doing this as part of a SSIS package.

My SSIS package failed on one shapefile and threw me an exception:

A .NET Framework error occurred during execution of user-defined
routine or aggregate "geometry": System.FormatException: 24305: The
Polygon input is not valid because the ring does not have enough
distinct points. Each ring of a polygon must contain at least three
distinct points.

After running some manual updates, I found the record that was giving me grief. Then I took out parts of the multi-polygon and ran individual selects on them, creating individual polygons, e.g.

SELECT geometry::STGeomFromText('POLYGON(( 118.501323586697 -20.3203577291617, 118.504216161911 -20.3220101539757, 118.502671059623 -20.3212136027048, 118.501323586697 -20.3203577291617 ))', 4326).MakeValid()

Finally, I found the polygon that was the problem shape: 'POLYGON(( 118.860739531873 -20.2274797478397, 118.860739531873 -20.2274797478397, 118.860739531873 -20.2274797478397, 118.860739531873 -20.2274797478397 ))'

I am trying to understand how to fix this so my SSIS package runs and translates and loads my SQL tables without me having to fix up anything manually. Any help will be greatly appreciated.

Best Answer

Any ETL process is about digesting data. Somewhere along your path, you are trying to digest bad data.

So how would you write a system that tries to digest, say, a point and tries to load it into a polygon?

Sure you can write stuff to allow to digest it. If it is a point, well, then buffer it by 5 meters! bam! You have a digestable geometry without manual intervention.

But that is not the point.

Currently you are thinking of your ETL process as a binary black box for your user ("works" vs "does not work") - and you want the "does not work" to go away.

This is fundamentally a fallacy.

Think of your ETL process as a series of gates instead. Some things can pass, and some things cannot. That crap polygon you have there, most certainly came from a geoprocessing function or a topology snapping operation where the geometry collapsed onto itself because of some tolerance problem.

You don't want that in your GIS until it is fixed.

The gate should stop it, because, trust me, that polygon will cause more problems if it is let inside the rest of your GIS.

My point is that silent failures is most of the time (with some exceptions) a bad approach - even more so for ETL.