[GIS] Alternative approach for loading osm2pgsql data

openstreetmaposm2pgsqlpostgresql

I am slowly going insane trying to successfully complete an OSM data import and I would like to find an alternative approach. What I'm attempting is frustratingly simple yet fails repeatedly.

Originally I attempted to import the latest planet .osm.bz2 file in slim mode but this failed with a parsing error. Then I tried to import the .pbf in slim mode and this was moving so slowly that it would have taken three months. Then I got the memory in my VM increased to 48GB and removed the -s slim mode argument, and the import failed when it ran out of memory. Finally I attempted to import the pbf in slim mode with a 44GB node cache and approximately half way through the nodes, after working for several days, it failed with Zlib compression failed (code -3, incorrect data check).

There has to be a simpler, faster, and more reliable way to do this. Ideally I would like to simply download someone else's Postgres data file, attach it to my database, sit back and relax.

I need the data to be in the osm2pgsql schema but I really don't care how it gets there, as long as it's no more than a couple of months old. Does anyone know why this is so complicated, and how to make it simpler?

Best Answer

The osm2pgsql documentation suggests osm2pgsql -c -d gis --slim -C <cache size> --flat-nodes <flat nodes> planet-latest.osm.pbf.

where

<cache size> is 24000 on machines with 32GiB or more RAM or about 75% of memory in MiB on machines with less
<flat nodes> is a location where a 24GiB file can be saved.

This works, and on a fast machine will take under a day.

Because you have a lot of RAM, I'd go with 28000 cache. There is no point in a cache of 35000, as that is more than is needed to store every node.

You should

use PBF, not bzipped XML

Given your previous problem, also download the md5 file and checksum the file with md5sum -c, e.g. md5sum -c planet-latest.osm.pbf.md5. This will ensure there wasn't corruption downloading or saving to disk
use slim mode

As the earlier linked page states, in non-slim there is also a way cache that takes up about as much RAM as the node cache, so you cannot do a non-slim import of the planet with 48GB of RAM.
decide on an update strategy

At some point, you will want to update your database. If you plan to do it infrequently, you'll probably just drop the database and reload the latest data. If this is your plan, add the --drop flag to your command line and the slim tables will not be indexed and instead dropped, saving over 100GB and hours of time. If using drop, you can also delete the flat nodes file.

If you plan on updating using minutely, hourly, or daily diffs, then you can't use --drop.
Specify the number of CPU cores to be used, with --number-processes.

This will help the "pending ways" section of the import. Recent (0.88.x) versions of osm2pgsql default to the number of hardware threads, but older versions default to 1.

You've mentioned using a VM. Many VMs have extremely slow disks when doing random IO, which may limit your speed.

Related Solutions

[GIS] OSM To Postgis & WGS84 problems

Just done the process for Cambridge. [This is the fastest method]

Using PostGIS 2.0 (pgAdmin postgres SQL Tools 1.14)

Downloaded the Cambridge OpenStreetMap data from here:

http://download.geofabrik.de/osm/europe/great_britain/england/

selected Cambridge note the cambridgeshire.shp.zip 01-May-2012 06:08 11M ESRI Shapefile (EPSG:4326), zipped

Loaded into PostGIS with no changes to the data using the 'PostGIS Shapefile Import/Export Manager' enter image description here

Then loaded the data into QGIS

enter image description here

Loading this will confirm that the WGS84 is being used.

enter image description here

[GIS] Loading OpenStreetMap data to a custom PostgreSQL schema via osm2pgsql

osm2pgsql creates the tables and indexes in the user default schema, i.e. the first one found in its search path, which is usually public. By changing this setting, in PostgreSql, one can load the data to a custom schema. It is important to keep the public schema in the path, as it is required for getting the hstore type.

--Create the new user
CREATE USER osm_loader WITH ENCRYPTED PASSWORD '*****';
--create the schema
CREATE SCHEMA osm AUTHORIZATION osm_loader;
--Change the search_path. 
ALTER ROLE osm_loader SET search_path TO osm, public;

and you can then launch the osm2pgsql tool with the new user

osm2pgsql.exe data.osm.pbf -U osm_loader [... other parameters as needed]

Alternatively, you could change the search path of the user you are currently using. Make sure to change it back once the data is loaded.

Best Answer

Related Solutions

[GIS] OSM To Postgis & WGS84 problems

[GIS] Loading OpenStreetMap data to a custom PostgreSQL schema via osm2pgsql

Related Question