[GIS] Understanding Extract Multi Values to Points performance in ArcPy

arcpyextractperformancerasterspatial-analyst

I am trying to extract points from multiple raster files into a points shapefile. I have about 320,000 unique XY coordinates which need to extracted values from 37 raster files. I tried running the following code and it has been three days since it has started.

import arcpy
import os
from arcpy.sa import *

if arcpy.CheckExtension("spatial") == "Unavailable":
    sys.exit("No ArcGIS Spatial Analyst licence available - exiting")
else:
     arcpy.CheckOutExtension("spatial")
#     arcpy.CheckInExtension("spatial")

arcpy.env.workspace = r"H:\GIS Project\in_rasters"
arcpy.env.overwriteOutput = True
arcpy.env.parallelProcessingFactor = "50"
point_feature_class = r"H:\GIS Project\XY_Points.shp"
rasters = arcpy.ListRasters("*","TIF")


for raster in rasters:
    ExtractMultiValuesToPoints(point_feature_class, raster, 'NONE')

My computer has enough ram (16 gigs) and I am running it in parallel. I would think this function would take a couple of hours maximum based off this post Getting raster values for large number of point features?. Albeit, this post is not exactly the same, it still shows some anecdotal evidence for processing time with large data.

My raster files are all projected the same (WGS 84) however they do not have the same resolution.

Is it normal for this function to take a long time to finish?

Best Answer

According to the help on this tool the environment parallel processing factor isn't considered.. not many tools do parallel process.

You might get a performance increase by running all rasters at the same time but on a smaller chunk of points:

import arcpy
import os
from arcpy.sa import *

if arcpy.CheckExtension("spatial") == "Unavailable":
    sys.exit("No ArcGIS Spatial Analyst licence available - exiting")
else:
     arcpy.CheckOutExtension("spatial")
#     arcpy.CheckInExtension("spatial")

arcpy.env.workspace = r"H:\GIS Project\in_rasters"
arcpy.env.overwriteOutput = True
arcpy.env.parallelProcessingFactor = "50" # not used by this tool
point_feature_class = r"H:\GIS Project\XY_Points.shp"
rasters = arcpy.ListRasters("*","TIF")

# build a list of your rasters
AllRasters = [] # an empty list
for raster in rasters:
    fName, fExt = os.path.splitext(raster)
    AllRasters.append([raster,fName])

FeatCount = int(arcpy.GetCount_management(point_feature_class).getOutput(0))
MemFCList = []
ChunkSize = 1000 # how many features to do at the same time
CountList = range(0,FeatCount,ChunkSize)

for StartValue in CountList:
    arcpy.AddMessage('Running chunk {}'.format(StartValue))
    MemFC = 'in_memory\pfc_{}'.format(StartValue)
    MemFCList.append(MemFC)
    # select a chunk of data
    arcpy.Select_analysis(point_feature_class,MemFC,'FID >= {} AND FID < {}'.format(StartValue,StartValue + ChunkSize))
    # run the tool on all rasters at once.
    ExtractMultiValuesToPoints(MemFC, AllRasters, 'NONE')

# replace the original by merging the chunks
arcpy.Merge_management(MemFCList,point_feature_class)

Each chunk of chunksize (default 1000) is copied into the in_memory workspace which should speed things up if you're accessing the features from a slow workspace (either a slow drive or network storage) though it will not help if your rasters are on a slow drive or network storage.

A few things to check first:

Repair geometry on your input points, there could be some dud points that are gumming up the works.
Ensure your rasters are on a local, preferable fast, drive. Accessing rasters on slow laptop drives, USB 2 and network/cloud drives will all make this process tediously slow.
The rasters are uncompressed. Having to decompress your rasters constantly uses CPU cycles that are better used elsewhere.
Your rasters aren't in a slow format like ASCII or XYZ. Both of these are rubbish for processing, if you do have either of these formats then converting to GeoTIFF or ERDAS IMG will reduce your processing time notably.

Related Solutions

[GIS] Renaming 1,200 rasters in File Geodatabase using ArcPy

I recently had a similar task. It seems arcpy.Rename_management() is slow by nature. I did find that adding arcpy.env.addOutputsToMap = False improves performance ever-so-slightly, as ArcMap doesn't have to spend time drawing the rasters as they're added. Alternatively, the script can be run from ArcCatalog.

I added additional timers and print statements to your script (see example outputs below. My Virtual machine was running extra slow, so times are exagerated):

import arcpy, timeit

arcpy.env.addOutputsToMap = False

def renameRasters(FGDB):
    origdir = arcpy.env.workspace
    arcpy.env.workspace = FGDB

    for rstr in arcpy.ListRasters("*"):
        rstrTimerStart = timeit.default_timer()
        try:
            newrstr = "G" + "2015" + rstr.replace("G2015G2014", "G2015")
            arcpy.Rename_management(rstr, newrstr)
            printMsg = "--Raster renamed " + newrstr + ". Timer: "

        except Exception as e:
            printMsg = "--Rename failed: " + rstr + ". Exception: " + str(e).replace("\n","") + "  Timer: "
        
        rstrTimerStop = timeit.default_timer()
        print printMsg + str(rstrTimerStop - rstrTimerStart) + " seconds."

    arcpy.env.workspace = origdir
    return None

start = timeit.default_timer()

print "Begin Raster Rename Function"
renameRasters(r"C:\Temp\Test_RasterRename.gdb")
print "Raster Rename Function Complete"

stop = timeit.default_timer()
print "Took " + str(stop - start) + " seconds to complete."

addOutputsToMap = True, From network drive

addOutputsToMap = False, From network drive

addOutputsToMap = False, From local drive

[GIS] Why does clipping a raster alters cell size

I have used the GDAL tools to rasterize a vector and providing the extension of my desired raster worked but if the data was very specific(i.e. lots of decimal points) I still didn't obtain the extension I needed. So I tried different things that worked in different ways:

a) I modified my rasters to the desired pixel size. If I created a raster from vector I followed this tutorial: http://www.mikemeredith.net/blog/1212_GIS_layer_for_Distance_from_in_QGIS.htm This indicates in the editable box where to provide the pixel size. It also indicates how to provide extension but because my extension was very "long" that didn't work for extension only for pixel size.

b) If you are using QGIS and are preparing your environmental layers for Maxent I suggest using the QSDM plugin and unify all your layers. All layers need to be in the same format (.tiff),same CRS and it helps if they have the same resolution. They will be given an equal extension as new layers that have been "unified". Even if they don't seem like they are the same, checking the metadata will confirm that they have been modified to have the same extension and resolution. Ready to be used in Maxent.

c) Additionally there is a nice tutorial here for Maxent AND QGIS, check out the: "Creating new rasters with GDAL tools" section. Where they cover re-sampling and this is great if you want to change pixel size for your environmental layers. http://clp-foss4g-workshop.readthedocs.org/en/latest/qgis_raster_resample.html

Hope it works and if there are other ways that worked for you I would be happy to know as sometimes for us beginners it is a lot of trial and error.