[GIS] Are personal geodatabases better suited for quickly querying indexed attributes than file geodatabases

arcgis-engineattribute-indexfile-geodatabasepersonal-geodatabasequery

I'm preparing data for an ArcGIS Engine application that queries the data to search for an address. Sometimes we search just on the street name field, just on the house number field, or both. When using personal geodatabases or SDE geodatabases, one can add a multi-column attribute index in addition to single-column indexes. For some reason, according to the Creating attribute indexes ESRI article, multi-column attribute indexes are not possible when using file geodatabases. They don't mention why this is the case – maybe file geodatabases don't need them for some reason?

A multi-column index on the house number field and the street name field should theoretically improve my query performance when searching both fields at once, but is it worth switching over to using a personal geodatabase? I have a feeling that downsides of using a personal geodatabase might negate the benefits of the multi-column index.

I have been under the impression that Esri wants us to move away from personal geodatabases, but is this a case where personal geodatabases are the better option? If you have any experience with this, I'd love to know.

Best Answer

To answer the first part of your question, I think it helps to look at the additional text in the Creating Attribute Indexes help file about Multi-column indexes.

The order in which fields appear in a multicolumn index is important. In a multicolumn index with column A preceding column B, column A will be used to conduct the initial search. Also, such an index will be much more useful for queries involving column A only than it will be for queries involving column B only.
Create a multicolumn index on A and B. This index would typically be more efficient for queries involving both columns. For queries involving only A, this index would be slower than an index on A alone. This index would be of little use for queries involving only B. To compensate, you could create an additional index on B.

Both of these passages show that multi-column indexes are better for specialized use. Further, using such an index to sort on only one of the included columns, could actually hurt performance. For this reason, it is likely that individual column indexes are going to be necessary for each of the attributes included in a multi-column index.

I found a link to an old, but interesting document by ESRI stating the 9 reasons to choose a File over a Personal GDB. It is interesting in that it specifically calls out performance as being one reason. Part of this performance gain is due to the file based storage system. I think this could also play into the lack of multi-column support. Unlike in the Personal GDB, which is a single file, an index in a File GDB is stored as a separate file in the GDB structure. This means that the index file and the attribute file for a particular featureclass will have to be linked and accessed together. I could see where a multi-column index would lead to jumping back and forth between the index and attribute files, and potentially causing a performance hit that outweighs the indexing performance gain.

Since there are already significant performance gains with the File GDB over the Personal GDB, it was probably not worth implementing the multi-column index.

In my experience working with both GDB types, I have seen the Personal GDB running about 50% larger than the file. Based on the data you provided regarding your File GDB, if you were to convert to a PGDB, you would probably end up with a ~300MB Personal GDB. From what I've seen, working with MS Access databases, both within ESRI products, and separately, is that you start to see performance degradation once the ".mdb" files increase significantly over 100MB in size.

The other issue would likely be that even if you could speed up your attribute searches, you would see a large performance hit related to moving about in the data frame, and refreshing the view. The layer simply wouldn't draw as fast if it were in a PGDB. This article comparing the Types of Geodatabases gives more information on the performance differences.

As with a lot of things, the best choice will ultimately boil down to what your use case is. If there are a lot of database specific operations that you would like to perform, like queries and updating, that you can do in the Access interface, then the Personal GDB may be better. If you only plan on doing some querying, but will primarily be visualizing the spatial data, then the performance definitely falls on the side of the File GDB.

Related Question