[GIS] Dice a polygon with over 21 million vertices in ArcMap

arcgis-desktoparcmapbig datadicelarge datasets

Based on posts like this I suspect I'm bumping up against issues when trying to work with a feature class that contains features with a single very complicated polygon. These errors include 000426, 000072, 99999, and 99998. I'm attempting to use the Project tool, Cut Polygons Tool, Add Geometry Attributes tool, and Feature Class to Feature Class tool. My operations complete successfully on other feature classes with features that have far fewer vertices.

I isolated the feature in the problematic feature class with the greatest number of vertices (over 21 million) and put it in its own shapefile. I've tried using the Dice tool to break this one polygon feature into several smaller features. I've tried setting the vertex limit both low (100) and high (1,000,000), as I'm not sure which would be better for minimizing memory requirements. The tool most often completes with this same error:

ERROR 000426: Out Of Memory

But I've also gotten it to intermittently complete successfully. I've only had success with Dice under these conditions:

  1. Recently restarted computer.
  2. Freshly restart ArcMap, but not necessarily the first start since the computer has been turned on.
  3. Allow the Dice tool's default recommended output name and location, on the first run of the tool in this ArcMap session (saves to [user]\Documents\ArcGIS\input_file_name_Dice.shp).
  4. Vertex limit set to 1,000,000 (no commas).

After completing successfully under these conditions, I can run Dice again with a different output name and/or location (shapefile or feature class, at a different directory or not) but cannot change the number of vertices. It will re-run successfully, some unpredictable number of times (0-4, perhaps)

I've already run this feature class through the Repair tool — which crashes ArcMap, but completes without error when run as an arcpy standalone script. Conversely, when I run the Dice tool through an arcpy standalone script, it errors out the same as above, but with an additional error prepended:

argisscripting.ExecuteError: ERROR 000072: Cannot process feature with OID 0
ERROR 000426: Out Of Memory

I have been able to perform some operations on the original feature class, now with the most complicated polygons removed (such as Project[ion]) which I couldn't do before. But some of the remaining features in the original feature class also have millions of vertices and are causing similar problems.

My system has 16gb of RAM, and literally has nothing installed on it other than what come with Windows 10 and the ArcMap 10.6 suite of software.

So here's my question: Am I missing something that would empower me to consistently run the ArcMap Dice tool on these large-number-of-vertices feature classes? Does anyone have workarounds other than restarting the computer, opening ArcMap, immediately running Dice and hoping it works on this attempt? Perhaps I've misidentified the root cause of these errors (but I don't think so).

I should mention that the organization I'm working with requires the exclusive use of ESRI/ArcGIS software tools to achieve this. If any tool was available to me, I'd definitely use PostGIS… but this is not an acceptable solution to my question.

Best Answer

You should install Background Processing 64bit from Esri which will install 64bit Python, so you could submit a call to the Dice tool and process your feature class with feature(s) that have so many vertices. Since you already have ArcGIS Desktop, there is no additional cost or licensing associated with this product.

Normally, you would not find polygons with so many vertices unless the vector data has been converted from raster using unnecessary precision. However, I have created a fictional polygon with roughly 30 mln vertices using Python. The script has finished successfully, but it took 5 hours.

This script would be helpful for anyone who need to create some fictional data for testing as well. You can set the source min/max XY coords as well as the number of vertices per side, so it is quite flexible.

import os
import itertools
import numpy as np
import arcpy

min_x = 100000
max_x = 250000
min_y = 90000
max_y = 150000

# will make vertices_per_side * 3 total vertices
vertices_per_side = 1000000

first = zip(
    np.linspace(
        min_x,
        min_x * 1.1,
        vertices_per_side,
    ), np.linspace(
        min_y,
        max_y,
        vertices_per_side,
    ))

second = zip(
    np.linspace(
        min_x * 1.1,
        max_x,
        vertices_per_side,
    ),
    np.linspace(
        max_y,
        max_y * 1.1,
        vertices_per_side,
    ),
)
third = zip(
    np.linspace(
        max_x,
        max_x * 1.1,
        vertices_per_side,
    ), np.linspace(
        max_y,
        min_y,
        vertices_per_side,
    ))

vertices = itertools.chain(first, second, third)
poly = arcpy.Polygon(
    arcpy.Array([arcpy.Point(*coords) for coords in vertices]),
    arcpy.SpatialReference(3857))
arcpy.CopyFeatures_management(poly, os.path.join(arcpy.env.scratchGDB, 'poly'))

Now you can create a Python script that would call only the Dice tool. You should try to set up a fairly large number of vertices for the vertex_limit parameter, though; 50,000 vertices would work fine, even 100,000 is OK. Using a smaller number such as 5,000 would significantly slow things down.

If you are not a savvy Python user, you can just start the Windows CMD and execute:

c:\Python27\ArcGISx6410.5\python.exe c:\scripts\my_dice.py

Otherwise you can execute your Python script from an IDE of your choice, but make sure you have chosen the 64bit Python interpreter. You can write a dummy Python script that would just print something to make sure your arcpy behaves well with the Background Processing 64bit installed.

import arcpy
print(arcpy.ProductInfo())

If your script printed the license you have, then you are all set. Now you can start running your main script. I suggest dicing just with 1 mln vertices per polygon to make sure it works first and then decrease the number up to 50,000 if required.

import os
import arcpy

@profile
def my_func():
    arcpy.Dice_management(
        in_features=os.path.join(arcpy.env.scratchGDB, 'poly'),
        out_feature_class=
        "C:/GIS/Temp/ArcGISHomeFolder/Default.gdb/poly3mln_Diced",
        vertex_limit="500000")

if __name__ == '__main__':
    my_func()

Running this for 30 mln vertices polygon took 4 hours. The memory footprint is tiny, so you should not worry about it with your 16GB of RAM.

Line #    Mem usage    Increment   Line Contents
================================================
     4  110.438 MiB  110.438 MiB   @profile
     5                             def my_func():
     6  110.441 MiB    0.004 MiB       arcpy.Dice_management(
     7  110.457 MiB    0.016 MiB           in_features=os.path.join(arcpy.env.scratchGDB, 'poly'),
     8                                     out_feature_class=
     9  110.457 MiB    0.000 MiB           "C:/GIS/Temp/ArcGISHomeFolder/Default.gdb/poly3mln_Diced",
    10  179.355 MiB   68.898 MiB           vertex_limit="500000")
Related Question