ArcPy Performance – Improving Performance with ArcGIS Cursors in Python

arcpycursorindexingperformance

I have a pretty big point feature class in a file geodatabase (~4 000 000 records). This is a regular grid of points with a 100m resolution.

I need to perform a kind of generalization on this layer. For this, I create a new grid where each point lies in the middle of 4 "old" points:

 *     *     *     *
    o     o     o
 *     *     *     *
    o     o     o
 *     *     *     *

[*] = point of the original grid – [o] = point of the new grid

The attribute value of each new point is calculated based on the weighted values of its 4 neighbors in the old grid. I thus loop on all the points of my new grid and, for each of them, I loop on all the points of my old grid, in order to find the neighbors (by comparing the values of X and Y in the attribute table). Once 4 neighbors have been found, we get out of the loop.

There is no methodological complexity here but my problem is that, based on my first tests, this script will last for weeks to complete…

Do you see any possibility to make it more efficient? A few ideas on the top of my head:

  • Index the fields X and Y => I did that but didn't notice any significant performance change
  • Do a spatial query to find the neighbors rather than an attribute-based one. Would that actually help? What spatial function in ArcGIS should do the job? I doubt that, e.g., buffering each new point will prove more efficient
  • Transform the feature class into a NumPy Array. Would that help? I haven't worked a lot with NumPy so far and I wouldn't like to dive into it unless someone tells me it might really help reducing the processing time
  • Anything else?

Best Answer

What if you fed the points into a numpy array and used a scipy cKDTree to look for neighbors. I process LiDAR point clouds with large numbers of points (> 20 million) in several MINUTES using this technique. There is documentation here for kdtree and here for numpy conversion. Basically, you read the x,y into an array, and iterate over each point in the array finding indices of points within a certain distance (neighborhood) of each point. You can use these indices to then calculate other attributes.

Related Question