[GIS] How to transfer attributes to thousands of local points from ArcSDE feature class with millions of polygons

arcgis-desktoparcmaparcpyenterprise-geodatabaseperformance

I have a set of points and a huge feature class with topographic information (OS MasterMap for who knows it). I'm talking millions of polygons huge. I want to transfer attributes of the polygons to the points where they intersect.

I've tried spatial join, but it ran out of memory. I've tried identity and it's giving me no clue at all about how long it might take, though I'm going to let it run over the weekend to see where it ends.

I then thought of making a script to take one point at a time and use that to select the underlying polygons. Doing a select layer by location with the little dialog window in ArcMap works great, it takes less than a second. However doing just that from the python window takes many minutes. I've started writing a full ArcPy script to do a select by location one by one point and then update the point's attributes with the selected polygons' attributes. Updating one point took about 15 minutes. And I've got about 8000 points to do. Unfortunately my boss won't let me sit idle and wait for a script to finish if it takes 80 days to run 🙁

Anyone has an idea about how this might be done more quickly, and why (why ESRI, why???) SelectByLocation works so much faster in ArcMap itself?

I should mention the points feature class is local and the polygons feature class is ArcSDE


SOLUTION:

Among many suggestions, the only feasible one for me, being unable to install anything or use multiple workstations at the same time, was Polygeo's solution, please see the answer below.
The key thing to do is: set a processing extent to a very small area around each point (do this in a cursor), make a one-polygon copy of the large dataset using this processing extent and get the attributes from the small copy.

I'm bound to try to do a query within the oracle database itself as well, but I'll need some assistance from a colleague with that. As well I'm trying to get PostGIS installed and use that. I'll report back on these as well as soon as I have results.

Best Answer

I think you should be able to get your "Updating one point took about 15 minutes" down to a few seconds by using arcpy.env.extent.

With 8,000 points the approach I am suggesting should complete within a day (worst case) - even if you write everything to disk (which I would do in initial testing), but an in_memory workspace should trim that further.

  1. arcpy.da.SearchCursor iterates your point feature class to read its coordinates and an identifier
  2. Select_analysis uses the identifier to copy a one point feature class from those 8,000
  3. Set arcpy.env.extent to a rectangle that is say a tenth of a metre around the coordinate
  4. CopyFeatures_management copies out the OSMM layer within the Geoprocessing extent i.e. almost always a single point but if you occasionally strike a boundary then you may get a few - this should take only a second or two because I frequently use this procedure on a 3.5 million polygon cadastre
  5. Intersect_analysis your one point feature class with your one (or few) polys feature class. If "transfer attributes of the polygons" is not all attributes then just reading/writing them via cursors may be used to speed this up to.
  6. Append_management your 8,000 intersected point feature classes back into a single feature class or, preferably use arcpy.da.InsertCursor to do this part a lot faster.

All in all, focus on testing step 4 first - if that is taking more than a few seconds then multiplying it by 8,000 becomes an issue.

Take care to turn arcpy.env.extent back to "MAXOF" once you have finished processing.