[GIS] What are Esri Geodatabases

esri-geodatabase

What are the various formats/storage technologies which fall under the name Esri Geodatabase?
What are the most important differences between them?
and (briefly) what are the mainstream means to utilize them?

There are conversations all over the place about personal and file and enterprise and (?) geodatabases. These are each different beasts which need distinct handling, but there is a lot of confusion in the answers as to which means apply to which formats of geodatabases.

update: I should add that don't feel any single answer has to address the whole range of possibilities. It would be okay to say "the two single user gdb formats are personal and file gdb, they are appropriate for xxx, have these limits xxx, and the primary differences between them are xxx" etc.

Best Answer

Most of the time, people make the mistake of thinking of the GeoDatabase as simply a geospatial format that allows you to do spatial queries. That is such an incredible simplification.

Heck, I used to make this mistake myself - until one day I heard Scott Morehouse explaining the rationales behind the GeoDatabase. He is one of those people that can think in very abstract ways, way way high up- and then also come down very fast and be very practical and thus avoiding problems that architectural astronauts have.

To understand what the GeoDatabase is, you need to look at the definition of an information model:

An information model in software engineering is a representation of concepts, relationships, constraints, rules, and operations to specify data semantics for a chosen domain of discourse. It can provide sharable, stable, and organized structure of information requirements for the domain context.

The GeoDatabase is simply an ESRI definition of an information model that supports Geographic concepts. For example, this information model supports concepts like Topology; with all of the rules, operations and data semantics associated with them (e.g what is allowed to overlap on top of what, what happens after a split, how does an edit affect other features that share the same edge, etc).

There are various implementation of the ESRI GeoDatabase information model and they can be categorized in two:

  1. Single User GeoDatabases:

    • Personal GeoDatabase: Built on top of the ".mdb" MS Access format.
    • FileGDB: Built on top of a proprietary format created by ESRI (".gdb" folders)
  2. Multi-user GeoDatabases (aka Enterprise GeoDatabases):

    These are the datasources supported by the ArcSDE middleware.

    • PostgreSQL
    • SQL Server
    • Oracle
    • DB2
    • Informix
    • etc

The purpose of ArcSDE also gets misunderstood. "SDE" often gets confused with a GeoDatabase - and in the worst cases, the terms are used interexchangeably; a horrible mistake. Back in the day, ArcSDE (then simply called SDE) was created to act as a data abstraction layer. You can find a simple description of ArcSDE from a really old USENET news post from Scott Morehouse (1999). A snippet from that post says:

SDE defers spatial processing to the DBMS. If the underlying database system has no spatial support at all, SDE will implement all of the spatial functionality. If the underlying database has some functionality, SDE will implement some functionality and defer the rest to the database engine. To achieve the best performance and leverage core database technology, we try to defer as much functionality to the database as possible.

That means that ArcSDE is used by the GeoDatabase when interacting with underlying data sources, but it doesn't know anything about GeoDatabase abstractions, like Relationships, Domains, Terrains, Cadastral Fabric, Schematic Datasets, etc. It is just used to make programming easier with various underlying data stores.

That's why if you are dealing with GeoDatabase-level abstractions, and then you try to do things from ArcSDE (via API or arcsde command line executables), you may run into problems. (Can I make this sentence bigger???)

As far as the limitations of each different implementation of the GeoDatabase, it is usually dependent on the underlying storage.

Personal GDB is bound to the 2GB mdb (Access) limit. FileGDB, doesn't have this problem since it was created to get rid of this limitation and to be compatible with unix.

Both Personal GDB and FileGDB are single user. So you don't get any versioning. GDB replication is implemented on top of versioning, so it is a feature of all Multi-user GeoDatabases (ArcSDE Datasources) only.

Topology, Annotations, Representation Classes, Domains, Terrains, etc, are all GeoDatabase concepts that do not require multi-user support - so they are available across all implementations of the GeoDatabase information model.

As far as usages for each GDB implementation, it depends on your needs. So there is a type of GeoDatabase for most (but not all) use cases.

I hope this makes it clear.