[GIS] Splitting strings into new rows, retaining geometry

arcgis-10.1arcgis-desktopattribute-tablefmer

UPDATED BELOW

I have 9 shapefiles, each shapefile has 700,000+ rows and I have roughly 12 attribute fields in each shapefile.

There are three attribute fields, common to every shapefile, that I am interested in – Type, Code, Area. In the majority of records, there is one entry for each field (| B | X2 | 3.45 |) but in some cases you have more than one entry in a field (up to 8) separated by comma;

| A, B, B | X2, ZZ9, GGR | 1.2, 3.1, 0.23 |

It is nearly always the case that if there is 3 records in one field, the other two fields concerned will also have 3 records. If this does not apply, then the data is incomplete and can be ignored. It is also the case that the 1st entry in one field is associated with the 1st entry in another field.

I have found how to Splitting string in Python parser of ArcGIS Field Calculator? and ArcMap 10.1 Python split inconsistent string but that results in deconcatenating the string by the comma delimiter and creating new attribute fields for every comma separated entry.

The thing i need is for the deconcatenated data to create new records NOT new fields. In the instance above;

 1. | A, B, B | X2, ZZ9, GGR | 1.2, 3.1, 0.23 |

would become

 1. | A | X2 | 1.2 | 
 2. | B | ZZ9 | 3.1 | 
 3. | B | GGR | 0.23 |

the top one is the original record with the superfluous data removed into two new records below it. I need the new records to retain all the other information from the other 9 fields that don't need to be touched and also the geometry as they refer to the same polygon space.

thanks very much, I'm stuck!

(Answers for ArcGIS 10.1, FME, R, Python all gladly accepted)

EDIT 9th APR:

i used the following code to split concatenated entries (by a comma) in fields 'TYPE' and 'AREA' into new rows (there are between 1 and 8 concatenated entries in TYPE and AREA):

import arcpy
searchC =  arcpy.da.SearchCursor("FeatureClass", ("TYPE","AREA", "SHAPE@"))

insertC = arcpy.da.InsertCursor("newFeatureClass", ("TYPE","AREA", "SHAPE@")) 

for row in searchC:
   A = [x.split(",") for x in row[:-1] ]

   for i in range(min(len(A[0]),len(A[1]))):
       newRow = [A[0][i] , A[1][i], row[2]]
       insertC.insertRow(newRow)
del insertC

this produces the correct amount of new rows based upon the content of the original concatenated rows.

but when doing this, i only get the 2 fields TYPE and AREA next to a new FID. each entry also has a uniqueID that has to come across as well, so i tried (amateurishly) this;

import arcpy
searchC =  arcpy.da.SearchCursor("FeatureClass", ("UNIQUEID", "TYPE", "AREA", "SHAPE@"))

insertC = arcpy.da.InsertCursor("newFeatureClass", ("UNIQUEID", "TYPE","AREA", "SHAPE@")) 

for row in searchC:
    A = [x.split(",") for x in row[:-1] ]

    for i in range(min(len(A[0]),len(A[1]),len(A[2]))):
        newRow = [A[0][i] , A[1][i], A[2][i], row[3]]
        insertC.insertRow(newRow)
del insertC

However, this now leaves the original concatenated row plus one new row that takes only the first entry from Type and AREA. UNIQUEID does populate both rows.

Any help? i want to split out the concatenated entries but make sure the UNIQUEID that applies to all is carried across, thanks very much

Best Answer

you can use something like below :

import arcpy
searchC =  arcpy.da.SearchCursor(featureclass, ("type","code","area" , "SHAPE@")) #plus other attribute if you need

insertC = arcpy.da.InsertCursor(newfeatureclass, ("type","code","area" , "SHAPE@")) 

for row in searchC:
    A = [x.split(",") for x in row[:-1] ]

    for i in range(min(len(A[0]),len(A[1]),len(A[2]))):
        newRow = [A[0][i] , A[1][i], A[2][i], row[3]]
        insertC.insertRow(newRow)
del insertC

you can use a similar code to create a table instead of a feature class, as mentioned by Vince, so that you do not duplicate the geometries but instead create a one to many relation.

Related Question