I got a box with the current hardware config:
CPU: Intel Xeon E3-1220 V2 3.1 GHz
RAM: 10gb
Hard Disk: Western Digital WD10EZRX SATA Hard Drive
OS: Centos 6.5 64 bit
DB: PostgreSQL 9.3.2
and PostgreSQL got following config
shared_buffers = 1024MB
work_mem = 256MB
maintenance_work_mem = 4096MB
fsync = off
synchronous_commit = off
checkpoint_segments = 100
I imported the entire OSM Europe data into PGSQL DB with the following command:
osm2pgsql -c -d gps -U gps --cache 8000 --number-processes 4 --slim europe-latest.osm.bz2
and it took 6 days to complete… it's a lot but it's ok…
now i want to keep my DB updated with the combo of osmosis + osm2pgsql
cmds:
osmosis --read-replication-interval workingDirectory=/osmosisworkingdir/ --simplify-change --write-xml-change - | \
osm2pgsql --append -d gps -U gps --cache 8000 --number-processes 4 -s -
Well… this takes MORE than one day to just align single day changes! That makes impossible to keep my local DB aligned!
Am i doing something wrong? Should i upgrade my RAM? Why updating is SOOOOOOOOOO damn heavier than load data for the first time?
Best Answer
There's a few issues.
The first one is hardware. Your drive is a WD Green drive, which is generally is about 5400 RPM which is a very slow drive, slower than typical 7200 RPM consumer drives.
One of the biggest tasks in updating is fetching node positions. This is random access, which your drive sucks at.
One option is to use the
--flat-nodes
option, which will use a 20GB binary file to store node locations instead of a table in the database. This can have significant performance boosts on HDDs as it relies more on sequential access than database tables.You don't need or benefit from 8000MB memory being used for node cache when updating. 1000 is probably enough. This will, depending on overcommit settings, allow more memory to be used for caching database content.
Your PostgreSQL
work_mem
andmaintenance_work_mem
are too high for a machine with 10GB. I'd probably go withwork_mem = 64MB
andmaintenance_work_mem = 1024MB
, and even that might be high.You haven't said where you got europe-latest.osm.bz2, but I'm presuming it's a geofabrik extract. If so, they make daily diffs which allow you to update your database with just data from Europe. To do this, you'd need to edit the osmosis configuration.txt file to change baseUrl to
baseUrl=http://download.geofabrik.de/europe-updates
. You'd also have to edit state.txt to the right sequence number, which you can find by looking at the dates in http://download.geofabrik.de/europe-updates/000/000/. Once you find the right state file you can download it, e.g. withcurl -o /osmosisworkingdir/state.txt http://download.geofabrik.de/europe-updates/000/000/416.state.txt
.Switching to daily diffs from geofabrik should cut back the amount of data that is being added.
Lastly, since you have to reimport for --flat-nodes, you should make sure to download a new Europe extract, and make sure to download PBF, not bzipped XML.
Your new import command should look something like
osm2pgsql -c -d gps -U gps --cache 8000 --number-processes 4 --slim \ --flat-nodes europe_nodes.bin europe-latest.osm.pbf
Your update command would be something like
osmosis --read-replication-interval workingDirectory=/osmosisworkingdir/ \ --simplify-change --write-xml-change - | \ osm2pgsql --append -d gps -U gps --cache 1000 --number-processes 4 --slim \ --flat-nodes europe_nodes.bin -
Of course, the biggest speed boost possible is probably from using an area smaller than all of Europe.
Even with all this, rendering tiles off of a 5400RPM drive is never going to be fast.