[GIS] Optimizing osm2pgsql imports for OSM data

mapnikopenstreetmapoptimizationpostgispostgresql

I'm currently building an instance on EC2 on which to import the entire Planet.osm snapshot of the whole Earth's worth of data for some projects we're working on. I've spun up a large Ubuntu x64 instance and attached plenty of separate storage on an EBS volume for the Postgres database and modified it to house the PGSQL data there.

Now the server is having trouble using osm2pgsql to import the snapshot… After a couple of attempts with different memory configs and whatnot, the process keeps outputting "Killed" after getting most of the way through; once it was killed while "going over pending ways" and the next time, after slightly adjusting the slim cache, it reached "processing ways" before crashing out. From what I've read, this is generally due to memory issues.

Here's my latest attempt to run the import:

osm2pgsql -v -U osm -s -C 4096 -S default.style -d osm /data/osm/planet-latest.osm.bz2

And here are the specs for a Large instance on EC2:

Large Instance 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of local instance storage, 64-bit platform

My question is — are there some good benchmark resources to determine the tuning requirements for osm2pgsql and Postgres? Speed of import isn't even that important to me, I'd just like to be able to make sure the process completes safely, even if it takes 4 or 5 days… I've read through Frederick Ramm's "Optimising the rendering chain" (PDF) document from last year's SOTM, but are there other good opinions / resources?

Best Answer

As the documentation say you may need more than 256gb of ram to do that.

I don't know much about EC2, but you can try the slim (--slim) mode or try Osmosis.

There is an interesting post: http://weait.com/content/build-your-own-openstreetmap-server It says, 'you must use slim mode'.