[GIS] Understanding Extract Multi Values to Points performance in ArcPy

arcpyextractperformancerasterspatial-analyst

I am trying to extract points from multiple raster files into a points shapefile. I have about 320,000 unique XY coordinates which need to extracted values from 37 raster files. I tried running the following code and it has been three days since it has started.

import arcpy
import os
from arcpy.sa import *

if arcpy.CheckExtension("spatial") == "Unavailable":
    sys.exit("No ArcGIS Spatial Analyst licence available - exiting")
else:
     arcpy.CheckOutExtension("spatial")
#     arcpy.CheckInExtension("spatial")

arcpy.env.workspace = r"H:\GIS Project\in_rasters"
arcpy.env.overwriteOutput = True
arcpy.env.parallelProcessingFactor = "50"
point_feature_class = r"H:\GIS Project\XY_Points.shp"
rasters = arcpy.ListRasters("*","TIF")


for raster in rasters:
    ExtractMultiValuesToPoints(point_feature_class, raster, 'NONE')

My computer has enough ram (16 gigs) and I am running it in parallel. I would think this function would take a couple of hours maximum based off this post Getting raster values for large number of point features?. Albeit, this post is not exactly the same, it still shows some anecdotal evidence for processing time with large data.

My raster files are all projected the same (WGS 84) however they do not have the same resolution.

Is it normal for this function to take a long time to finish?

Best Answer

According to the help on this tool the environment parallel processing factor isn't considered.. not many tools do parallel process.

You might get a performance increase by running all rasters at the same time but on a smaller chunk of points:

import arcpy
import os
from arcpy.sa import *

if arcpy.CheckExtension("spatial") == "Unavailable":
    sys.exit("No ArcGIS Spatial Analyst licence available - exiting")
else:
     arcpy.CheckOutExtension("spatial")
#     arcpy.CheckInExtension("spatial")

arcpy.env.workspace = r"H:\GIS Project\in_rasters"
arcpy.env.overwriteOutput = True
arcpy.env.parallelProcessingFactor = "50" # not used by this tool
point_feature_class = r"H:\GIS Project\XY_Points.shp"
rasters = arcpy.ListRasters("*","TIF")

# build a list of your rasters
AllRasters = [] # an empty list
for raster in rasters:
    fName, fExt = os.path.splitext(raster)
    AllRasters.append([raster,fName])

FeatCount = int(arcpy.GetCount_management(point_feature_class).getOutput(0))
MemFCList = []
ChunkSize = 1000 # how many features to do at the same time
CountList = range(0,FeatCount,ChunkSize)

for StartValue in CountList:
    arcpy.AddMessage('Running chunk {}'.format(StartValue))
    MemFC = 'in_memory\pfc_{}'.format(StartValue)
    MemFCList.append(MemFC)
    # select a chunk of data
    arcpy.Select_analysis(point_feature_class,MemFC,'FID >= {} AND FID < {}'.format(StartValue,StartValue + ChunkSize))
    # run the tool on all rasters at once.
    ExtractMultiValuesToPoints(MemFC, AllRasters, 'NONE')

# replace the original by merging the chunks
arcpy.Merge_management(MemFCList,point_feature_class)

Each chunk of chunksize (default 1000) is copied into the in_memory workspace which should speed things up if you're accessing the features from a slow workspace (either a slow drive or network storage) though it will not help if your rasters are on a slow drive or network storage.

A few things to check first:

  1. Repair geometry on your input points, there could be some dud points that are gumming up the works.
  2. Ensure your rasters are on a local, preferable fast, drive. Accessing rasters on slow laptop drives, USB 2 and network/cloud drives will all make this process tediously slow.
  3. The rasters are uncompressed. Having to decompress your rasters constantly uses CPU cycles that are better used elsewhere.
  4. Your rasters aren't in a slow format like ASCII or XYZ. Both of these are rubbish for processing, if you do have either of these formats then converting to GeoTIFF or ERDAS IMG will reduce your processing time notably.
Related Question