[GIS] Python MemoryError when running ArcGIS Python Toolbox tool

I have a Python Toolbox (.pyt) containing a single tool. Part of that tool is the following function, which creates a dictionary from an attribute table, where the keys are the rows' OBJECTIDs and the values are pulled from a field specified when calling the function. The attributes are accessed with a generator based on an arcpy.da.SearchCursor.

def make_oid_dict(fc, value_field):
    describe = arcpy.Describe(fc)
    key_field = describe.OIDFieldName
    key_value_pairs = (row for row in arcpy.da.SearchCursor(fc, (key_field, value_field)))
    oid_dict = dict(key_value_pairs)
    return oid_dict

I want to use the dictionary to access attribute values of the feature IDs in a Near Table. I've been trying to run the tool, but every time it calls this function — for a point feature class with about 1.25 million rows and a value_field that is formatted as a Long integer — it fails with a Python MemoryError.

I have successfully called the function on the same feature class from the Python window in ArcCatalog, so I have no idea why it wouldn't be working from within the Python toolbox. The resultant dictionary isn't even that large: just 24 MB, according to sys.getsizeof().

Any ideas what the problem might be?

Edit 1: I've updated the function to use a dictionary comprehension as follows, per Jason Scheirer's suggestion, but that unfortunately has not solved the MemoryError issue.

def make_oid_dict(fc, value_field):
    describe = arcpy.Describe(fc)
    key_field = describe.OIDFieldName
    oid_dict = {row[0]: row[1] for row in arcpy.da.SearchCursor(fc, (key_field, value_field))}
    return oid_dict

Edit 2: This is an update to clarify the solution(s).

My make_oid_dict() function works for me without throwing a MemoryError if and only if the tool is run using background processing (i.e. by setting self.canRunInBackground = True in the tool's __init__ method). Thanks to Aaron for pointing me in this direction.

In the case that the tool needs to run in foreground processing mode, however, Jason Scheirer's make_oid_array() function will run successfully — or at least it did for me.

def make_oid_dict(fc, value_field): key_field = arcpy.Describe(fc).OIDFieldName with arcpy.da.SearchCursor(fc, (key_field, value_field)) as cur: oid_dict = { row[0]: row[1] for row in cur } return oid_dict

def make_oid_array(fc, value_field): oid_array = array.array('L') key_field = arcpy.Describe(fc).OIDFieldName with arcpy.da.SearchCursor(fc, (key_field, value_field)) as cur: for row in cur: oid, value = row if oid > len(oid_array) - 1: oid_array.extend((0 for x in xrange(oid + 1 - len(oid_array)))) oid_array[oid] = int(value) return oid_array

Best Answer

See if switching to a dicitonary comprehension would help:

And if you're still running out of memory, try recasting the value back to int (long in Python can use up a lot more memory):

Another alternative, which may not benchmark as fast, would be to use the array module. Note that this approach will not support sparseley populate values. That is, if you have a table with OIDs [1, 2, 3, 5] the entry for the OID 4 will also be included as a value.

This data structure will not be nearly as efficient while looping through the cursor, but will represent itself in memory more efficiently. A numpy array would work as well.

Best Answer

Related Solutions

[GIS] Using Custom Functions in Python Toolbox

[GIS] Defining Feature Set parameter for ArcGIS Python Toolbox tool

Related Question