[GIS] Performing health checks on ArcGIS Server

arcgis-server

ArcGIS Server setup in an enterprise environment:

i.e. Multiple SOCs, perhaps a fail-over setup, separate web server, SDE/DBMS on separate machine, etc.

Without going into the specifics on versions/software/operating systems/etc, I would like to know what people would recommend on performing "health checks" on this kind of scale setup. Or perhaps diagnostics is a better word?

I was thinking it would be a good idea to run monthly checks (as opposed to consistent monitoring) to ensure everything is running smoothly and to perhaps identify bottlenecks or problem areas in the setup. Ideally having a specific workflow that could be easily repeated, and then gather historical data to see if the setup has deteriorated over time.

I hope this is not too subjective a question, but I think there will be experts out there that will have "right" answers to this, and perhaps any discussion can be done via comments and deleted as need be?

To make the question more specific, please assume:

  • SDE has been setup optimally.
  • ArcGIS Server services have also been setup optimally (i.e. Cached where appropriate, scale ranges/definition queries, etc).

I was thinking of putting together a custom application that sits on the webserver and allows a user to hit a button which would do things like:

  • ping each endpoint (each IP, check XML from Server WSDL ok, various REST endpoints)
    • Fail/Pass to these tests
    • Perhaps repeat these pings, and show an average response time for each endpoint.

These tests could be performed in off-peak hours, and then generate a basic report on the outcomes.

I guess you could also swap out ArcGIS Server for any server technology (which got me thinking on perhaps this belongs in ServerFault).

I know ArcGIS Server has logging and statistics capabilities.
I should also point out that automatic alerts have already been put in place to notify when servers go down or are performing very badly. I am really after some advice on what to test/identify in terms of diagnosing if the overall system is "healthy" (i.e. Is it running ok, is it worse than last month, can something be improved?)

I would be interested to know what people (who are experienced in multi-tier/high use setups) think of this.

Best Answer

Latitude Geographics developed Geocortex Optimizer for just this purpose. It's a program that installs as a service, and performs periodic monitoring of your ArcGIS Services and the servers behind them (by performing ping requests, web requests, map requests, and also by monitoring log files and performance counters).

There are also API hooks that let you connect a web viewer to the Optimizer collectors, so you can get info on how your web viewer is being used. (what extents, what tools, user activity, etc)

The data it collects gets pushed into a database and there is a reporting module that analyzes the data and presents the results as a web page. There are graphs and heat maps to help visually represent trends and usage. You can also have some reports emailed periodically.

DISCLAIMER: I work at Latitude Geographics, although not directly with the Optimizer product.