[GIS] Table structure for a reverse geocoding database

databasereverse-geocoding

I should start by saying that I do not have much experience with GIS so this question may be foolish. What I do have is a database for performing a lookup from a set of coordinates to find the bounding box or LSD in which they reside. At the moment the database contains several thousand tables and we run some client side code to decide on which table to query, load that entire table back to the client and filter it client side. That seems dramatically inefficient to me as we are passing back quite a lot of data. Each table defines a large area at which to look and each record defines a smaller area. The areas are defined by 4 corner points. I suppose the original design was based on a quadtree but instead of there being four children and many levels it was decided to have only two levels and many children at each level. I am looking to perform some optimization of this structure.

The database is SQL server 2005 so there is no help available from native geospatial datatypes. What I would like to know is

  • Is this sort of thousand+ table database common for GIS work?
  • Would it not be more efficient to keep the data in a single table and take advantage of some index magic? (Is it possible to index efficiently on the decimals which define each point of the bounding box such that you can find intermediate values?)
  • How could this structure be improved to allow for rapid queries?

Best Answer

I agree with Mapperz, postgres/postgis would be a good way to go if you're looking to change your database system at the same time.

But if you're stuck with SQL Server 2005 (which I assume doesn't have spatial extensions), you could create a hash function of the input coordinates such that geographically close locations are also close in the table, but I can't think of one off hand.

An alternative may be to create a series of tables, one for each level of a quadtree. You may only need 5 or 6 such tables depending on the coverage of your dataset. You do a query on the top level, which gives you a value that is used as part of the key of the next level, and so on. That way you're only testing against at most four bounding boxes for each level, so with 6 levels, that's anywhere betweeen 6 and 24 bounding box tests, and 6 table queries. Now I've not tried this, so it may be not as efficient as it seems in my head, but it may be worth a try.

Related Question