[GIS] Large Scale Geocoding and Processing in ESRI

arcgis-10.0arcgis-9.3enterprise-geodatabase

Ok, so I guess this kind of a informal query/survey about how big a datasets you are using in your ESRI worlds…

I am building and maintaining a statewide dataset, where I have to process down to the individual house level, not parcel level but multiple mailing addresses per parcel for our systems. In many places I am using theoretical addresses calculated from street network or USPS AMS/AIS data. So my Address List is roughly 13.5 million addresses and growning monthly or quarterly.

Is anyone out there right now maintaining a live system of address/properly lookup information that is this large in a continuous dataset?

I would love to collaborate or talk more about how others are handling such a large dataset. I am seeing issues where ESRI software seems to be blowing up when I try to perform tasks such as intersects or spatial joins. ESRI says they don't see these kinds of issues but I have had these issues since back to 9.3.1 so I can't be the first/only person doing this since I can recreate it across multiple machines.

My Platform right now is ESRI ArcGIS 10 on the Desktop, talking to ArcSDE 9.3.1-sp1 on a SQL2008 backend using the GEOMETRY spatial object. So I am not doing anything really exotic; but still seems to me that in some areas I maybe am pushing the envelope.

[Further]

What I am interested in know is what are other people doing to optimize there processes for dealing with these datasets. I am going to be adding upwords of a million records a month going forward, and while Geocoding etc isn't a problem when you start running other processes and linking data for further analysis you start dealing with complex joins. Well, you output data from Intersects/Overlays/Identities using Only_FID and you get a thin middle table to join too; but when you start trying to divide and conquer the creation of that table you start to hit issues where you need to divide your source data into working areas but then you have repeating IDS that you can't merge back; so you are left with smaller blocks of data that you can't easily make whole again.

Thinking about options that break the data down to County-by-County scale, then using spatial views to join it back together etc… Just curious if other users are looking at the same kinds of problems on such a large scale but on small footprints.

Best Answer

As it's an (old) open ended question I'll give you a open-ended answer: Using the database properly can save massive amounts of time. The obvious way to do something isn't necessarily the fastest, for instance when I recently wanted to delete a lot of rows from Oracle, turns out that just sending: delete from TABLE1 where ID = 123 for each feature was incredibly slow and that there's some fancy Oracle stuff I can do to make it orders of magnitude faster.

So basically if you find a particular problem that's a bottleneck, ask a specific question relating to that bottleneck to the experts. So for the ArcGIS side that would probably be here (or the ESRI forums, or your ESRI support), but for a database-side issue (and things will usually be faster if you do them there) you'd want to ask at http://www.stackoverflow.com

Related Question