[GIS] How to speed up batch watershed delineation

algorithmarcgis-10.0arcgis-desktophydrology

I have a large number of pour points – approximately 250,000 for which I want to delineate the upstream watershed area using ArcHydro's "Batch Watershed Delineation". Given that it takes roughly 4-5 hours for 1000 points to delineate on my system I need to find a faster way than just running these as individual processes on my computer. Currently I am testing out running 4 sets of 1000 side-by-side on my 64bit windows i7 system, so that will likely speed up the process, but I think it could be sped up even more through an opportunity to access a "condor" type system which runs multiple applications of a program at any given time.

The catch is that I need to talk to ArcGIS from a command line and also get ArcHydro to work from command line to do the upstream watershed delineation. I know the files that I require, so would likely just need to know if it is possible to extract the code that ArcHydro uses to define the entire upstream catchment area of points. I can't seem to find that easily within the ArcGIS GUI.

Would anyone know of a way to identify/extract the code that ArcHydro is using to run "Batch Watershed Delineation", and if so would you also know how to call to ArcGIS to do that process without opening ArcGIS and running it from a command line?

Best Answer

In a conversation following the question, we refined the objective. It is to obtain

the watershed area of each point. Some of these points are close to each other, but they are all unique and we would like to determine the watershed area for each point. We will use the watershed area in a predictive model to distinguish a score for each point.

The proposed solution, which worked in this case (and was far faster than the batch watershed delineation), is to run a FlowAccumulation calculation using a constant grid as input. (Set its values to the area of a single cell). Then simply read off its values at each of the 200,000 points.

To understand this, note that the total area draining to a point is (obviously) equivalent to "total number of cells draining to the point" times "area of a cell." The former is precisely what FlowAccumulation calculates when it is given a unit grid as input.

An important insight here is that sometimes (frequently, in my experience) we can accomplish a compute-intensive task more efficiently by focusing on our ultimate objective, rather than on each specific technical step in a workflow, and attempting to identify an accurate and fast algorithm to reach that objective. Adding more processors, getting speedier disks, etc., can speed up a long operation, but only by a limited amount. Changing the algorithm altogether can improve it by orders of magnitude.