[GIS] ogr2ogr slow to load ArcGIS file geodatabase data into PostgreSQL

Problem: Using ogr2ogr to transfer feature sets and attribute tables from ArcGIS file geodatabases to a PostgreSQL database is taking unnecessarily long. For instance, regardless of machine I am working on, it takes roughly two days to load a table with 9 million rows; each row has five attributes with formats of double precision, integer, integer, varchar(55), and double precision. I am looking for help because this is going to be an on-going task and I am trying to develop a repeatable work process.

What I am hoping to find out: I have done some searches on speeding up ogr2ogr and PostgreSQL, but the most common suggestions have not sped things up. My guess is there is either a flag I am missing in ogr2ogr or a setting I am overlooking in postgresql.conf that I need.

What I have tried:
The ogr2ogr command format I am using is:

ogr2ogr -progress -overwrite -skipfailures --config PG_USE_COPY YES -f "PostgreSQL" PG:"host=*hostip* user=*username* dbname=*databasename* password=*userpassword*" "C:\pathTo\GDBName.gdb" "*tablename*"

Based on PostgreSQL tuning recommendations I found, I made the following changes to postgresql.conf:

shared_buffers – increased to 500MB
temp_buffers – increased to 32MB
work_mem – increased to 32MB
maintenance_work_mem – enabled but left default value (64MB)
wal_buffers – increased to 8MB
checkpoint_timeout – increased to 15min
max_wal_size – increased to 15GB
checkpoint_completion_target – increased to 0.85

Environment:

PostgreSQL computer – Windows 10 Enterprise (64-bit), Intel Xeon CPU (2.6GHZ), 16GB RAM, PostgreSQL 9.5 with PostGIS. This is not an ESRI Enterprise Database (SDE).
Data source computers (various) – So far I have used Windows 7, Windows 10, and Ubuntu 17.04 computers to run ogr2ogr. All have 8GB RAM and CPUs that are 2-3 years old. All are using GDAL 2.2.1.

**Note: I know the PostgreSQL database can load feature sets and attribute tables from a file geodatabase quickly. I first loaded my data using ArcCatalog 10.5 and everything loaded quickly. The large tables took 2-4 hours on average as opposed to the 2 days to 1.5 weeks ogr2ogr is taking. The problem with ArcCatalog, and the reason I am using ogr2ogr, is that it converted all the double precision ID fields to integer, which made the ID fields useless.

While corrupted, the fact that the database did load large (to me) tables rapidly using ArcCatalog is why I am assuming I have overlooked a simple configuration tweak.

ogr2ogr -progress -update -append -f "PostgreSQL" PG:"host=hostip user=username dbname=databasename password=userpassword" "C:\pathTo\GDBName.gdb" -nln schema.existingtable "gdbtablename" --config PG_USE_COPY YES

Best Answer

After turning debug on the problem became clear. [face palm]

ogr2ogr was encountering an error on every row it was adding to the PostgreSQL table it created. The error was caused because the numeric IDs are expressed as 14 digit whole numbers, which is why they are all formatted as double precision rather than integer. As ogr2ogr was creating tables on the fly and populating them it saw the whole numbers as integers, threw a COPY error, and then accepted it was double precision (I assume from the format information the file geodatabase keeps) and put the correct values in the table.

The problem was solved by setting up the table structure in PostgreSQL first and then using the -update, -append, and -nln flags. What had taken 46 minutes on my test table using the ogr2ogr command from my question took only 5 seconds after adding those flags as no errors occurred while loading the rows.

Setting up each table in PostgreSQL prior to loading the data will be tedious (there are 155 tables in the file geodatabases), but as this is an annual task I can either dump/load the schema structure each year or update and rerun the table building SQL. It is a fair amount of upfront work, but loading the data each subsequent year will be significantly faster than the first attempt.

The ogr2ogr command format is now:

Best Answer

Related Solutions

[GIS] ogr2ogr DXF to PostgreSQL some LINESTRING coordinates mirrored to the negative longitude

[GIS] ogr2ogr batch load shapefiles into geodatabase

Related Question