[GIS] In_memory workspace in geoprocessing services

arcgis-serverarcpygeoprocessing-servicein-memory

I have a arcpy python script that runs a process that I publish as a geoprocessing service on ArcGIS Server 10.3.1. The python script uses various input feature classes. These feature classes are within geodatabases that are registered on the ArcGIS Server.

My python script also uses the in_memory workspace for storage of some temporary data. The in_memory feature classes are not the input or final output, there are only intermediate data.

When trying to publish the geoprocessing service, the publishing tools want to copy the in_memory feature classes to the server, since the in_memory workspace is not registered with ArcGIS Server. The specific warning I get is Data source used by Script MyToolThatDoesStuff is not registered with the server and will be copied to the server: in_memory\myTempData

I let the publishing tools copy the in_memory data to the server and the my geoprocessing service works as expected. However, I suspect that every time the tool is run, the in_memory data is copied to the arcgisserver\directories\arcgissystem\arcgisinput\ MyToolThatDoesStuff .GPServer\extracted\v101\data1.gdb geodatabase and never removed. Over time this gdb bloats, slows down the geoprocessing service and ultimately fills the disk, creating mass problems.

My questions:

Is there a way to prevent the geoprocessing tools from publishing
the in_memory workspace to the server?
Is there a way to register the in_memory workspace with ArcGIS Server?
Is using the in_memory workspace in scripts that are published as geoprocessing services not “best practice”. If that's the case, how should temporary, intermediate data be processed in arcpy scripts?

Best Answer

In_memory layers are not written to the disk; that is the whole point with the in_memory geodatabase workspace.

The in_memory workspace is not published to the server.
Registering in_memory on the server wouldn't make sense.
Using in_memory is best practices for handling the intermediate data both for desktop and server geoprocessing workflows.

For the publishing purposes, I usually suggest do a dry run of the tool, when no code is executed and no data is created. Publish the tool results and then either uncomment the code in the published file (or maybe you have a parameter value that is used for defining the dry run).

Even better, keep the copied script with ArcGIS bits published (parameters etc) separated from the business code. Please refer to this post for best practices I found very useful when working with GP services.

Related Solutions

[GIS] Connecting Geoprocessing Service to ArcSDE

As pointed out by Alex in a comment,

arcpy.env.packageWorkspace

is an expected update in your script if you look at where the GP Service is deployed. The packageWorkspace is the directory itself. When publishing a service that uses SDE data, and you've referenced the SDE database in your datastore, a copy of the connection file (.sde) gets moved into the directory and your script is updated to reference that. This is expected and how its supposed to work.

You're right about "generally" speaking per your link different versions will work together:

According to this link I should however be able to publish form 10.2.2 to 10.1 without too many problems.

I do want to point out times though this wont be true. There have been a few tools which have been enhanced between 10.1 and 10.2.2. These tools can only be publish when each product is at the same version. (this is just a general note, this point wont be your issue).

There have also been some bugs fixed between the versions, mostly related to script update and data paths. Again, generally speaking most should work, but there could be cases that they dont. I dont have enough information about your workflow to say that updating Server to 10.2.2 will solve the problem (so I'm not suggesting that).

Your best test is with ArcMap, to navigate down into the v101 folder and find the .RLT (result) file. Drag that into your map. Open the Results window and under the "shared" node, run that. Thats basically like running the service, but running it locally, not via ArcGIS Server. If that works, its Server having problems connecting to SDE. If it doesn't work, its not resolving the path to your SDE connection file and I'd sort of expect the version differences perhaps messing something up. The easy "test" to prove that would be to hack the script in that directory to point to the connection file explicitly.

[GIS] How to update a Geoprocessing Service when its Python script changes

No, you don't need to re-run the script tool and republish the result. You will need to that only if you make any changes to the tool parameters (adding/removing/changing data type). This is required because if you take a look at C:\arcgisserver\directories\arcgissystem\arcgisinput\REF01\%Gpservicename%.GPServer\extracted\v101, you will find a toolbox which contains your script as well as the result. You cannot make modifications to the toolbox published, this will not be saved even though it is editable. If you will perform changes to the tool often, consider using the Python script for automating the process of publishing the GP result, there are many samples for this.

There is nothing that can stop you from going into the folder and editing the Python script published directly (just copy/paste the code) - it will work in most cases except when while publishing some of your variables were replaced by internal Esri variables. Please don't do that. It is so easy to end up having your source and published scripts not in sync, and it will get messy quite quickly.

The best practice I came to while working last two years on the GP services is to split the code files and the tool itself. Let me explain below.

Create a Python file (I refer to this as Caller file).

import sys
import socket #or just hardcode the machine name and the path; use UNC for shared folder
sys.path.append(r"\\" + socket.gethostname() + "%path to the actual code files")
import codefile1
import codefile2

Param1 = arcpy.GetParameterAsText(0) 
Param2 = arcpy.GetParameterAsText(1)

def mainworkflow(Param1,Param2):
    """General function-caller for codefiles"""
    Result = codefile1.functionName(Param1,Param2)
    return Result

if Param1 =="" and Param2 =="": #to have empty default values after publishing
    #for GP script tool publishing only purposes
    Result = ""
else:
    Result = mainworkflow(Param1,Param2)

Make a tool from this Python file specifying the parameters in the dialog box. Now this will be published as a GP service, and you can create and work on new Python files which will contain only the code that actually does the job. Whenever you realize that you need to split your code into multiple files - is just about importing the file from the Caller and calling the functions.

After performing the changes in the code, feel free to run the GP service directly - the Caller Python file will import the codefile1 Python file at the folder you specified and execute the code. No restart of the GP service, no reimport is required. As simple as that. I have many GP services I take care (~3 thousands of lines of code and ~20 Python modules). This approach is very efficient and I am happy I am using it.

In order to be able to access the Python files (modules you import), you should make sure that the folder where they are stored are accessible to the ArcGIS Server Account. This is because the GP service is being run under this account and it needs to access the service's resources.

Best Answer

Related Solutions

[GIS] Connecting Geoprocessing Service to ArcSDE

[GIS] How to update a Geoprocessing Service when its Python script changes

Related Question