What is the difference between a spatial index and an attribute index in an ArcGIS geodatabase?
What are their purposes?
arcgis-desktopesri-geodatabase
What is the difference between a spatial index and an attribute index in an ArcGIS geodatabase?
What are their purposes?
Slightly edited snippet from a conversation in the GDAL/OGR Mailing list:
Relationship Classes come in two types
1.- Simple Relationship Classes
2.- Attributed Relationship ClassesConceptually, they just relate one (or more) column(s) in one field to another column(s) in another table. Besides also including cardinality information, and enforcing referential integrity (when the underlying db doesn't support it), they are used inside ArcGIS for display and editing purposes. The fact that the may or may not have domains associated with them is orthogonal to this discussion.
For the first kind (simple), they only exist in metadata tables - they don't map to any physical tables on the db.
For the second kind (attributed), they do refer to actual non-spatial tables on the db.
Most of the time, people make the mistake of thinking of the GeoDatabase as simply a geospatial format that allows you to do spatial queries. That is such an incredible simplification.
Heck, I used to make this mistake myself - until one day I heard Scott Morehouse explaining the rationales behind the GeoDatabase. He is one of those people that can think in very abstract ways, way way high up- and then also come down very fast and be very practical and thus avoiding problems that architectural astronauts have.
To understand what the GeoDatabase is, you need to look at the definition of an information model:
An information model in software engineering is a representation of concepts, relationships, constraints, rules, and operations to specify data semantics for a chosen domain of discourse. It can provide sharable, stable, and organized structure of information requirements for the domain context.
The GeoDatabase is simply an ESRI definition of an information model that supports Geographic concepts. For example, this information model supports concepts like Topology; with all of the rules, operations and data semantics associated with them (e.g what is allowed to overlap on top of what, what happens after a split, how does an edit affect other features that share the same edge, etc).
There are various implementation of the ESRI GeoDatabase information model and they can be categorized in two:
Single User GeoDatabases:
Multi-user GeoDatabases (aka Enterprise GeoDatabases):
These are the datasources supported by the ArcSDE middleware.
The purpose of ArcSDE also gets misunderstood. "SDE" often gets confused with a GeoDatabase - and in the worst cases, the terms are used interexchangeably; a horrible mistake. Back in the day, ArcSDE (then simply called SDE) was created to act as a data abstraction layer. You can find a simple description of ArcSDE from a really old USENET news post from Scott Morehouse (1999). A snippet from that post says:
SDE defers spatial processing to the DBMS. If the underlying database system has no spatial support at all, SDE will implement all of the spatial functionality. If the underlying database has some functionality, SDE will implement some functionality and defer the rest to the database engine. To achieve the best performance and leverage core database technology, we try to defer as much functionality to the database as possible.
That means that ArcSDE is used by the GeoDatabase when interacting with underlying data sources, but it doesn't know anything about GeoDatabase abstractions, like Relationships, Domains, Terrains, Cadastral Fabric, Schematic Datasets, etc. It is just used to make programming easier with various underlying data stores.
That's why if you are dealing with GeoDatabase-level abstractions, and then you try to do things from ArcSDE (via API or arcsde command line executables), you may run into problems. (Can I make this sentence bigger???)
As far as the limitations of each different implementation of the GeoDatabase, it is usually dependent on the underlying storage.
Personal GDB is bound to the 2GB mdb (Access) limit. FileGDB, doesn't have this problem since it was created to get rid of this limitation and to be compatible with unix.
Both Personal GDB and FileGDB are single user. So you don't get any versioning. GDB replication is implemented on top of versioning, so it is a feature of all Multi-user GeoDatabases (ArcSDE Datasources) only.
Topology, Annotations, Representation Classes, Domains, Terrains, etc, are all GeoDatabase concepts that do not require multi-user support - so they are available across all implementations of the GeoDatabase information model.
As far as usages for each GDB implementation, it depends on your needs. So there is a type of GeoDatabase for most (but not all) use cases.
I hope this makes it clear.
Best Answer
Think of your data as a book. If you have a table with millions of records, you can think of it as a book with millions of pages. If you were to find all instances of a given word inside of this book, it would take a very long time. It is the same when searching a database table with many records.
An attribute index functions the same as an index in a book. If you are looking for a particular record, an index helps the database engine find it faster. When a database creates an index, it creates a separate index table which organizes your key fields in an easy to find manner. The index table contains address pointers to where the record can be found in the source table.
A spatial index is the same idea. It makes finding spatial relationships between geometries faster. If you were looking for what polygons intersected another polygon, it wouldn't be very efficient to scan every single vertex for every single polygon in a spatial table.
A spatial index provides metadata for quickly placing spatial data in relationship to other spatial data. A "first order" spatial query will compare the MBR (minimum-bounding-rectangle) of polygons to see if the extent of the polygons even match. If so, a "second order" query will see if the actual vertices interact.