[GIS] Update database via osmosis and osm2pgsql too slow

openstreetmaposm2pgsqlosmosispostgresql

I got a box with the current hardware config:

CPU: Intel Xeon E3-1220 V2 3.1 GHz
RAM: 10gb
Hard Disk: Western Digital WD10EZRX SATA Hard Drive

OS: Centos 6.5 64 bit
DB: PostgreSQL 9.3.2

and PostgreSQL got following config

shared_buffers = 1024MB
work_mem = 256MB
maintenance_work_mem = 4096MB
fsync = off
synchronous_commit = off
checkpoint_segments = 100

I imported the entire OSM Europe data into PGSQL DB with the following command:

osm2pgsql -c -d gps -U gps --cache 8000 --number-processes 4 --slim europe-latest.osm.bz2

and it took 6 days to complete… it's a lot but it's ok…

now i want to keep my DB updated with the combo of osmosis + osm2pgsql cmds:

osmosis --read-replication-interval workingDirectory=/osmosisworkingdir/ --simplify-change --write-xml-change - | \
osm2pgsql --append -d gps -U gps --cache 8000 --number-processes 4 -s -

Well… this takes MORE than one day to just align single day changes! That makes impossible to keep my local DB aligned!

Am i doing something wrong? Should i upgrade my RAM? Why updating is SOOOOOOOOOO damn heavier than load data for the first time?

Best Answer

There's a few issues.

The first one is hardware. Your drive is a WD Green drive, which is generally is about 5400 RPM which is a very slow drive, slower than typical 7200 RPM consumer drives.

One of the biggest tasks in updating is fetching node positions. This is random access, which your drive sucks at.

One option is to use the --flat-nodes option, which will use a 20GB binary file to store node locations instead of a table in the database. This can have significant performance boosts on HDDs as it relies more on sequential access than database tables.

You don't need or benefit from 8000MB memory being used for node cache when updating. 1000 is probably enough. This will, depending on overcommit settings, allow more memory to be used for caching database content.

Your PostgreSQL work_mem and maintenance_work_mem are too high for a machine with 10GB. I'd probably go with work_mem = 64MB and maintenance_work_mem = 1024MB, and even that might be high.

You haven't said where you got europe-latest.osm.bz2, but I'm presuming it's a geofabrik extract. If so, they make daily diffs which allow you to update your database with just data from Europe. To do this, you'd need to edit the osmosis configuration.txt file to change baseUrl to baseUrl=http://download.geofabrik.de/europe-updates. You'd also have to edit state.txt to the right sequence number, which you can find by looking at the dates in http://download.geofabrik.de/europe-updates/000/000/. Once you find the right state file you can download it, e.g. with curl -o /osmosisworkingdir/state.txt http://download.geofabrik.de/europe-updates/000/000/416.state.txt.

Switching to daily diffs from geofabrik should cut back the amount of data that is being added.

Lastly, since you have to reimport for --flat-nodes, you should make sure to download a new Europe extract, and make sure to download PBF, not bzipped XML.

Your new import command should look something like

osm2pgsql -c -d gps -U gps --cache 8000 --number-processes 4 --slim \ --flat-nodes europe_nodes.bin europe-latest.osm.pbf

Your update command would be something like

osmosis --read-replication-interval workingDirectory=/osmosisworkingdir/ \ --simplify-change --write-xml-change - | \ osm2pgsql --append -d gps -U gps --cache 1000 --number-processes 4 --slim \ --flat-nodes europe_nodes.bin -

Of course, the biggest speed boost possible is probably from using an area smaller than all of Europe.

Even with all this, rendering tiles off of a 5400RPM drive is never going to be fast.