[GIS] Python MemoryError when running ArcGIS Python Toolbox tool

#memory#errorarcgis-10.1arcpypython-toolbox

I have a Python Toolbox (.pyt) containing a single tool. Part of that tool is the following function, which creates a dictionary from an attribute table, where the keys are the rows' OBJECTIDs and the values are pulled from a field specified when calling the function. The attributes are accessed with a generator based on an arcpy.da.SearchCursor.

def make_oid_dict(fc, value_field):
    describe = arcpy.Describe(fc)
    key_field = describe.OIDFieldName
    key_value_pairs = (row for row in arcpy.da.SearchCursor(fc, (key_field, value_field)))
    oid_dict = dict(key_value_pairs)
    return oid_dict

I want to use the dictionary to access attribute values of the feature IDs in a Near Table. I've been trying to run the tool, but every time it calls this function — for a point feature class with about 1.25 million rows and a value_field that is formatted as a Long integer — it fails with a Python MemoryError.

I have successfully called the function on the same feature class from the Python window in ArcCatalog, so I have no idea why it wouldn't be working from within the Python toolbox. The resultant dictionary isn't even that large: just 24 MB, according to sys.getsizeof().

Any ideas what the problem might be?


Edit 1: I've updated the function to use a dictionary comprehension as follows, per Jason Scheirer's suggestion, but that unfortunately has not solved the MemoryError issue.

def make_oid_dict(fc, value_field):
    describe = arcpy.Describe(fc)
    key_field = describe.OIDFieldName
    oid_dict = {row[0]: row[1] for row in arcpy.da.SearchCursor(fc, (key_field, value_field))}
    return oid_dict

Edit 2: This is an update to clarify the solution(s).

My make_oid_dict() function works for me without throwing a MemoryError if and only if the tool is run using background processing (i.e. by setting self.canRunInBackground = True in the tool's __init__ method). Thanks to Aaron for pointing me in this direction.

In the case that the tool needs to run in foreground processing mode, however, Jason Scheirer's make_oid_array() function will run successfully — or at least it did for me.

Best Answer

See if switching to a dicitonary comprehension would help:

def make_oid_dict(fc, value_field):
    key_field = arcpy.Describe(fc).OIDFieldName
    with arcpy.da.SearchCursor(fc, (key_field, value_field)) as cur:
        oid_dict = { row[0]: row[1] for row in cur }
    return oid_dict

And if you're still running out of memory, try recasting the value back to int (long in Python can use up a lot more memory):

def make_oid_dict(fc, value_field):
    key_field = arcpy.Describe(fc).OIDFieldName
    with arcpy.da.SearchCursor(fc, (key_field, value_field)) as cur:
        oid_dict = { row[0]: int(row[1]) for row in cur }
    return oid_dict

Another alternative, which may not benchmark as fast, would be to use the array module. Note that this approach will not support sparseley populate values. That is, if you have a table with OIDs [1, 2, 3, 5] the entry for the OID 4 will also be included as a value.

def make_oid_array(fc, value_field):
    oid_array = array.array('L')
    key_field = arcpy.Describe(fc).OIDFieldName
    with arcpy.da.SearchCursor(fc, (key_field, value_field)) as cur:
        for row in cur:
            oid, value = row
            if oid > len(oid_array) - 1:
                oid_array.extend((0 for x in xrange(oid + 1 - len(oid_array))))
            oid_array[oid] = int(value)
    return oid_array

This data structure will not be nearly as efficient while looping through the cursor, but will represent itself in memory more efficiently. A numpy array would work as well.

Related Question