Geodatabase Search – Most Efficient Way to Search a Geodatabase for NULL-like Records

arcgis-10.0arcgis-10.1arcpyattribute-table

The problem: I have a geodatabase with several datasets and many more feature classes within. The fields within the feature classes have been populated through joins with shapefiles and manual edits. Often times string fields will become populated with whitespace (i.e. '', ' ', ' ', etc) or the string "Null", and numeric fields will become populated with a zero (0). I would like to find these records and replace them with a true NULL value. I have the following code, which uses an UpdateCursor, but it still very slow and doesn't catch all of the NULL-like records. Does anyone know of other ways to accomplish this task?

GDB = arcpy.GetParameterAsText(0) #input geodatabase
arcpy.env.workspace = GDB
datasetList = arcpy.ListDatasets() #list datasets

for dataset in datasetList:
        arcpy.env.workspace = os.path.join(GDB, dataset)
        fcList = arcpy.ListFeatureClasses()
        for fc in fcList:
                arcpy.AddMessage("Processing %s..." % fc)
                #count features
                arcpy.MakeTableView_management(fc, "tempTableView")
                count = int(arcpy.GetCount_management("tempTableView").getOutput(0))
                if count > 0:
                        fieldList = arcpy.ListFields(fc)
                        for field in fieldList:
                                arcpy.AddMessage("...%s" % field.name)
                                rows = arcpy.UpdateCursor(fc)
                                for row in rows:
                                        count = 0
                                        if row.isNull(field.name):
                                                continue # if already null
                                        elif field.type == "Text":
                                                value = row.getValue(field.name)
                                                if value.lstrip(' ') == '' or value.lower() == '<null>': # looks for whitespace or '<null>'
                                                        row.setNull(field.name)
                                                        count += 1
                                        elif field.type == "ShortInteger" or field.type == "LongInteger" or field.type == "Float" or field.type == "Double":
                                                value = row.getValue(field.name)
                                                if value == 0:
                                                        row.setNull(field.name)
                                                        count += 1
                                        if count > 0: # update row if records have changed
                                                rows.updateRow(row)
                                del rows
                else:
                        arcpy.AddMessage("...NO RECORDS FOUND.")

Best Answer

Since I'm most familiar with 10.1+, and cursors in general are a lot better in the da module, here's a potential solution. Currently, you are creating a cursor each time you change fields, which means you are taking a hit there. Furthermore, you are checking the field type for each record instead of just using the field type once to filter initially.

I've changed how Null values are checked, but I haven't thoroughly tested it to check for all possible values. For the small sample dataset I had, it worked @ 10.2.2.

#Return None if the value needs to be changed, else return the value
def nullify(value):
    x = value    
    if value is not None: #True null fields are read as None types
        if type(value) == str:
            if value.lstrip(' ') == '' or value.lower() == '<null>':
                x = None
        else: 
            if value == 0:
                x = None 

return x

#We're only interested in some fields
ftypes = ("String", "SmallInteger", "Integer", "Double")
fieldList = [f.name for f in arcpy.ListFields(fc) if f.type in ftypes]

with arcpy.da.UpdateCursor(fc, fieldList) as rows:
    for row in rows:            
        nulled = map(nullify, row)
        if row != nulled: #Only update if the row actually needs to be changed.
            rows.updateRow(nulled)
Related Question