[GIS] ogr2ogr slow to load ArcGIS file geodatabase data into PostgreSQL

file-geodatabasegdalogr2ogrpostgresql

Problem: Using ogr2ogr to transfer feature sets and attribute tables from ArcGIS file geodatabases to a PostgreSQL database is taking unnecessarily long. For instance, regardless of machine I am working on, it takes roughly two days to load a table with 9 million rows; each row has five attributes with formats of double precision, integer, integer, varchar(55), and double precision. I am looking for help because this is going to be an on-going task and I am trying to develop a repeatable work process.

What I am hoping to find out: I have done some searches on speeding up ogr2ogr and PostgreSQL, but the most common suggestions have not sped things up. My guess is there is either a flag I am missing in ogr2ogr or a setting I am overlooking in postgresql.conf that I need.

What I have tried:
The ogr2ogr command format I am using is:

ogr2ogr -progress -overwrite -skipfailures --config PG_USE_COPY YES -f "PostgreSQL" PG:"host=*hostip* user=*username* dbname=*databasename* password=*userpassword*" "C:\pathTo\GDBName.gdb" "*tablename*"

Based on PostgreSQL tuning recommendations I found, I made the following changes to postgresql.conf:

  • shared_buffers – increased to 500MB
  • temp_buffers – increased to 32MB
  • work_mem – increased to 32MB
  • maintenance_work_mem – enabled but left default value (64MB)
  • wal_buffers – increased to 8MB
  • checkpoint_timeout – increased to 15min
  • max_wal_size – increased to 15GB
  • checkpoint_completion_target – increased to 0.85

Environment:

  • PostgreSQL computer – Windows 10 Enterprise (64-bit), Intel Xeon CPU (2.6GHZ), 16GB RAM, PostgreSQL 9.5 with PostGIS. This is not an ESRI Enterprise Database (SDE).
  • Data source computers (various) – So far I have used Windows 7, Windows 10, and Ubuntu 17.04 computers to run ogr2ogr. All have 8GB RAM and CPUs that are 2-3 years old. All are using GDAL 2.2.1.

**Note: I know the PostgreSQL database can load feature sets and attribute tables from a file geodatabase quickly. I first loaded my data using ArcCatalog 10.5 and everything loaded quickly. The large tables took 2-4 hours on average as opposed to the 2 days to 1.5 weeks ogr2ogr is taking. The problem with ArcCatalog, and the reason I am using ogr2ogr, is that it converted all the double precision ID fields to integer, which made the ID fields useless.

While corrupted, the fact that the database did load large (to me) tables rapidly using ArcCatalog is why I am assuming I have overlooked a simple configuration tweak.

Best Answer

After turning debug on the problem became clear. [face palm]

ogr2ogr was encountering an error on every row it was adding to the PostgreSQL table it created. The error was caused because the numeric IDs are expressed as 14 digit whole numbers, which is why they are all formatted as double precision rather than integer. As ogr2ogr was creating tables on the fly and populating them it saw the whole numbers as integers, threw a COPY error, and then accepted it was double precision (I assume from the format information the file geodatabase keeps) and put the correct values in the table.

The problem was solved by setting up the table structure in PostgreSQL first and then using the -update, -append, and -nln flags. What had taken 46 minutes on my test table using the ogr2ogr command from my question took only 5 seconds after adding those flags as no errors occurred while loading the rows.

Setting up each table in PostgreSQL prior to loading the data will be tedious (there are 155 tables in the file geodatabases), but as this is an annual task I can either dump/load the schema structure each year or update and rerun the table building SQL. It is a fair amount of upfront work, but loading the data each subsequent year will be significantly faster than the first attempt.

The ogr2ogr command format is now:

ogr2ogr -progress -update -append -f "PostgreSQL" PG:"host=hostip user=username dbname=databasename password=userpassword" "C:\pathTo\GDBName.gdb" -nln schema.existingtable "gdbtablename" --config PG_USE_COPY YES
Related Question