[GIS] Find duplicate vertices in SQL Server geometry line (ArcSDE)

enterprise-geodatabasespatial-databasesql server

I have a line in a ZM polyline featureclass that has a invalid geometry. My suspicion is that the line doubles back on itself somewhere, which I've found SQL Server doesn't like. Anyone know of a quick SQL method or query that could help me identify the suspect bad points that are jacking up my geometry? The string representation looks like this:

1835815.86 12887142.42 0 0, 1835816.72 12887142.68 170 170, 1835817.53 12887142.76 349.99 350, 1835817.52 12887142.76 559.99 560, 1835817.78 12887142.76 659.99 660, ....

Also, I'm wondering if I could use a regular expression and a look ahead and/or look behind to find duplicate numbers??

Best Answer

Here's one way with Python. Get the linestring binary as string out of the database:

select shape.ToString() from table_in_sde

then take that and put it into a variable in python, use some regex, list and dictionary goodness to find the dups (to be honest, I Googled to find the dictionary dup stuff):

>>> import re
>>> s = 'LINESTRING (1835815.86 12887142.42 0 0, 1835816.72 12887142.68 170 170, 1835817.53 12887142.76 349.99 350, 1835817.52 12887142.76 559.99 560,....)'
>>> l = re.findall(r'(\d+.\d{2})\s',s)

>>> icount = {}
>>> for i in l:
...     icount[i] = icount.get(i,0) + 1

>>> for key, value in icount.iteritems():
...     if value > 1:
...             dups[key] = value
...
>>> dups
{'12887142.42': 2, '12887142.76': 3, '3081.28': 2}
>>>

The third dictionary item is irrelevant (it's Z values and they can be present muliple times). The first two items are duplicate Y values. Key is the coordinate, value is the count of how many times it appears in the string.