[GIS] Extremely slow OSM data import

openstreetmaposm2pgsql

I'm using osm2pgsql to import the entire planet database. I've followed the instructions from the switch2osm (https://switch2osm.org/serving-tiles/manually-building-a-tile-server-14-04/) page and used the following command:

osm2pgsql --slim -C16384 --cache-strategy sparse -d osm_2015 --number-processes 4 planet-latest-september-5.pbf

Note that I am doing the import on a virtual machine (VM). Other details are as follows:

Planet file size: 29GB
RAM: 24GB
CPU: 8 X Intel(R) Xeon(R) CPU X5650  @ 2.67GHz
4TB disk mounted remotely
OS: Ubuntu 14.04
psql (PostgreSQL) 9.3.9
PostGIS Version 2.1.2

I've completed importing but osm2pgsql took 1817529 seconds overall to import the data, which comes to about 21 days!

My question is, what can I possibly do to optimize/minimize the import time in the future using the specifications that I've listed above? Please note that I'm really new to this.

Best Answer

I'm not sure where you got your command line from, but you have options there which aren't in the guide

osm2pgsql --help suggests osm2pgsql -c -d gis --slim -C <cache size> -k --flat-nodes <flat nodes> planet-latest.osm.pbf, where * is 20000 on machines with 24GB or more RAM * is a location where a 19GB file can be saved.

If you are not planning on consuming updates, you can add --drop to the command, which will substantially speed it up, and you can also delete the flat nodes file.

On older versions of osm2pgsql, add --number-processes 8.

The other big speed gain is from tuning PostgreSQL, mainly increasing maintenance_work_mem and work_mem, probably to 1GB and 64MB for your server.

If it remains slow, it's probably because your disks are slow. Remote disks can have a high latency, which sucks for databases.

Related Question