[GIS] Fastest way to write large numpy arrays to feature class: as rows, or as columns

arcpyarraydatastructurefeature-classnumpy

I have a numpy data processing loop that repeats ~50,000x and outputs ~23 data values. I want to write this data to the attribute table of my ~50,000-point multipoint feature class.
With each loop, I generate 8 individual values, plus 5 sets of three values. (I'm still deciding whether to group them as 3 sets of 5 or 5 sets of 3, TBD by the answer to below.)

With this number of iterations, I want to be mindful of processing time. How should I approach organizing & writing the data? Should I:

…write 23 fields of data to the output file row-by-row at the end of each loop? (Or is the processing time of writing to file too long to do ~50Kx?)
…or collect them into 23 separate ~50K-length numpy arrays and write them column-by-column after all looping is done? (Or would I likely run into memory issues?)

Here's my approach for the second method:

#23 data columns to be added
fieldlist = ["azimuth", "orient", "crit#f", "freq1", "amp1", "pwr%1", "freq2", "amp2", "pwr%2", "freq3", "amp3", "pwr%3", "freq4", "amp4", "pwr%4", "freq5", "amp5", "pwr%5", "Nspurs", "len_mean", "len_sd", "ht_mean", "ht_sd"]

for fieldname in fieldlist:
    arcpy.AddField_management(gridcopy, fieldname, "FLOAT") #Add new fields
    with arcpy.da.UpdateCursor(gridcopy, [fieldname]) as cursor: #Add data
        i = 0
        for row in cursor:
            row = X[i]
            cursor.updateRow(row)
            i += 1

Best Answer

If you don't have any data already preserved in the output feature class, it might be faster to insert new rows instead of updating existing ones. For this, you would use da.InsertCursor.

If you still would like to update the rows, then do the following:

#23 data columns to be added
fieldlist = ["azimuth", "orient", "crit#f", "freq1", "amp1", "pwr%1", "freq2", "amp2", "pwr%2", "freq3", "amp3", "pwr%3", "freq4", "amp4", "pwr%4", "freq5", "amp5", "pwr%5", "Nspurs", "len_mean", "len_sd", "ht_mean", "ht_sd"]

for fieldname in fieldlist:
    arcpy.AddField_management(gridcopy, fieldname, "FLOAT") #Add new fields

with arcpy.da.UpdateCursor(gridcopy, fieldlist) as cursor: #Add data
    for row in cursor:
        row = X 
        cursor.updateRow(row)

~50K rows is nothing, will be updated in no time at all. Since you are working with numpy arrays, it might also be worth to check arcpy.da methods for converting numpy arrays into feature classes such as NumPyArrayToFeatureClass which will convert a NumPy structured array to a point feature class. They are very fast, I've been using arcpy.da module functions for large numpy arrays (millions of records) and they are very fast.

Performance information update:

It will be faster to use the UpdateCursor than to create geodatabase tables first from numpy arrays as creating new objects in a geodatabase does have an overhead and using ArcGIS-based joins would be very slow. Look at the execution time results - updating actual rows takes just 2 seconds; the slowest part is importing arcpy and getting count of features:

from functools import wraps
import numpy as np
import time

fields_list = ['field{}'.format(i) for i in xrange(1,24)]
fc = r'C:\GIS\Temp\ArcGISHomeFolder\Default.gdb\points'

#----------------------------------------------------------------------
def report_time(func):
    '''Decorator reporting the execution time'''
    @wraps(func)
    def wrapper(*args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)
        end = time.time()
        print(func.__name__, round(end-start, 3))
        return result
    return wrapper

@report_time
#----------------------------------------------------------------------
def setup():
    """set up sample data and add fields to feature class"""
    import arcpy
    total_features = int(arcpy.GetCount_management(fc).getOutput(0))
    print total_features
    #for f in fields_list:
        #arcpy.AddField_management(in_table=fc, field_name=f, field_type='FLOAT')
    data = np.random.uniform(low=0.5, high=13.3, size=(total_features, 23))
    return data

@report_time
#----------------------------------------------------------------------
def update_cursor(data):
    """update feature class fields using da.UpdateCursor"""
    with arcpy.da.UpdateCursor(fc, fields_list + ['OID']) as cursor: #Add data
        for row in cursor:
            row[0:23] = data[row[-1] - 1]
            cursor.updateRow(row)

#----------------------------------------------------------------------
def main():
    """"""
    data = setup()
    update_cursor(data)

if __name__ == '__main__':
    main()

49696 ('setup', 20.678) ('update_cursor', 1.837)

Related Solutions

ArcGIS Desktop – Fastest Way to Count Features in a Feature Class

I am using an example with 1 million randomly generated points inside of a filegeodatabase. Attached here.

Here is some code to get us started:

import time
import arcpy

arcpy.env.workspace = "C:\CountTest.gdb"

time.sleep(5) # Let the cpu/ram calm before proceeding!

"""Method 1"""
StartTime = time.clock()
with arcpy.da.SearchCursor("RandomPoints", ["OBJECTID"]) as cursor:
    rows = {row[0] for row in cursor}

count = 0
for row in rows:
    count += 1

EndTime = time.clock()
print "Finished in %s seconds" % (EndTime - StartTime)
print "%s features" % count

time.sleep(5) # Let the cpu/ram calm before proceeding!

"""Method 2"""
StartTime2 = time.clock()
arcpy.MakeTableView_management("RandomPoints", "myTableView")
count = int(arcpy.GetCount_management("myTableView").getOutput(0))

EndTime2 = time.clock()
print "Finished in %s seconds" % (EndTime2 - StartTime2)
print "%s features" % count

And some initial results:

>>> 
Finished in 6.75540050237 seconds
1000000 features
Finished in 0.801474780332 seconds
1000000 features
>>> =============================== RESTART ===============================
>>> 
Finished in 6.56968596918 seconds
1000000 features
Finished in 0.812731769756 seconds
1000000 features
>>> =============================== RESTART ===============================
>>> 
Finished in 6.58207512487 seconds
1000000 features
Finished in 0.841122157314 seconds
1000000 features

Imagine larger, more complex datasets. The SearchCursor will indefinitely crawl.

I am not at all dissatisfied with the results, however, the DataAccess module is being used extensively in our GIS development circle. I am looking to rebuild some of our function definitions with this module as it is more flexible than a MakeTableView + GetCount methodology.

ArcGIS Event Layer to Feature Class – Fastest Method

I've found a model builder model to be fastest with: "Make X Y Event Layer" followed by "Feature class to Feature Class"

Best Answer

Related Solutions

ArcGIS Desktop – Fastest Way to Count Features in a Feature Class

ArcGIS Event Layer to Feature Class – Fastest Method

Related Question