Ill quote some references from Dave Peters System Design Strategies wiki, which is recommended for a more thorough read to understand the complexity of answering this question. I would also recommend checking the relevant version of web-help on tuning services.
I think this is actually a really good question, albeit a little vague, as it is something that is asked a number of times.
Ill try to come back to this question over time to beef up the answer.
Happy for it to become a community wiki if people want to improve my answer.
What are Service Instances?
Service instance is a service configuration parameter that identifies the minimum and maximum number of process threads that will be deployed by ArcGIS for Server to satisfy inbound web service requests.
It should not be confused with the install instance at v9.3.1 and 10 of ArcGIS Server, which to avoid confusion, has now been changed to GIS Server site at v10.1.
- Minimum number of specified service instances will be deployed during
server startup.
- Additional service instances will be deployed by the service manager based on service request demands up to the maximum specified service configuration.
These instances run on the container machines (peers in your ArcGIS Site at 10.1). If the service is high isolation, each instance runs as its own process. Low isolation allows multiple instances to share a process, which is usually recommended, as multi-threading makes better use of memory (although if a process crashes, multiple jobs could be lost). With low isolation, between 8 and 24 instances from same service can share a process.
![enter image description here](https://i.stack.imgur.com/kv3sX.png)
Whats an optimal setting?
It is important to identify the proper instance configuration for each map service deployment. Proper service instance configurations depend on the expected peak service demands and the server machine core processor configuration.
An application that uses an instance, will only use it for the amount of time it takes to complete a request. After the request is completed, the instance is released back to the pool for someone else to use.
When the maximum number of instances of a service is in use, a client requesting a service is queued until another client releases one of the services. The amount of time it takes between a client requesting a service and getting a service is the wait time.
You can inspect your logs and ArcGIS Server Statistics (no longer there at 10.1) to determine which services are more popular and require more instances being dedicated to them.
Dave Peters general rule that is a short answer for this question:
The Maximum instances should provide one more instance that available
server machine cores. i.e. N+1 instances where N = number of server
cores
I would highly recommend reading this straight from the Wiki and adjust these settings with care. If you need more specific answers to a certain scenario, then you will need to raise this in a different question.
It doesn't matter as long as you access the Manager from that machine. In order to access from a different machine you need to use either the fully qualified domain
(http://mywebdomain.com/arcgis/manager)
or from inside your network the machine name the machine name can be used.
The 6080 is required only when you don't have the web adapter installed.
If you don't have a .com registered you will need to do that.
If your company has a .com you can "if IT will allow" register a sub domain
(they can probably take care of that for you)
like gis.mycompanywebsite.com (the gis is the subdomain part).
Once you have the domain or subdomain you don't need to do anything else (except wait for it to propagate). you will just call that external ip or url in your javascript.
comments are ok but try to keep them to a min. you can edit your original question with augmented or enhanced information to help the question.
http://gis.mycompanywebsite.com/arcgis/rest/services/101/my_service_name/MapServer
Best Answer
Instances in Server 10.1 are pooled by default. Once a request is completed, the instance is returned to the pool and not held by the requester. Given that, your 1000ms delay might be too much time in between requests. 2 instances might be enough to handle that much volume. Try reducing your delay between requests and see if the number in use goes up.
Also, is this a single machine cluster with 4 instances per machine? Or is it a different configuration? If it is a two machine cluster with 2 instances per machine, it is possible that one of the machines is unable to fulfill the requests for some reason. If that is the case, you should see logged errors for that machine and you will never see more than 2 instances in use.