OpenStreetMap Data – Processing Raw OSM Data for OpenStreetMap.org

openstreetmap

Can anyone provide insight into how OSM data is processed or rendered for www.openstreetmap.org?

A specific example is that I extracted data from a recent planet.osm PostGIS dataset for an area in Missouri. The OSM data needs a lot of cleaning before it can be rendered using the correct styles. Many water bodies are stored as line strings that don't close properly, so I have to use FME for snapping and then polygon building so that I can have blue filled rivers / lakes.

If I look at the same data here the water bodies are rendered as expected.

I'm having trouble identifying all the cases where snapping is required (e.g. which 'Natural' types require it and what the tolerance should be). Also I suspect there are many other data issues that I will never see as I am dealing with all of North America.

Does everyone who downloads and uses OSM data go through their own cleanup process?

Does anyone know how this cleanup is handled by www.openstreetmap.org?

It seems like their process would be the best informed and most tested.

Here is more information on my workflow

A planet.osm file is downloaded and loaded into PostGIS, using Osmosis, into the pgsql schema. I then extract OSM xml from PostGIS for lots of small areas, again using Osmosis. Each of these small xml files is then converted into Shapefiles using FME and its broad feature categories. It is this stage (OSM xml -> Shp via FME) that I am expecting to convert lines into polygons and perform other cleanup on the data.

These Shapefiles are served up through GeoServer (and cached using GWC).

Best Answer

There are a few different angles to this, and since it's unclear how you're processing data initially, I guess I'll just give an overview.

There are two main ways to consume OSM data - by using osm2pgsql, an older utility that supports 'stylesheets' and differential updates, and Imposm, a newer, Python-based system that supports Python-based stylesheet transforms. When people do processing, a lot of it is in that kind of script. For instance, here's an imposm mapping for osm-bright, the stylesheet upon which MapBox Streets (disclosure/employee) is based.

To be more specific to what you're encountering, it's likely that you aren't properly processing osm relations properly, which, in the data model are what allow multiple linestrings to form polygons. Tools like Imposm and osm2pgsql generally handle this kind of data transformation for you.

As far as how OSM.org itself does things: edits are in a 'semantic' Postgres database, and continuously imported into a PostGIS database with osmosis, and rendered with Mapnik. There's no manual cleanup step between the database and map rendering, since the two are highly coupled and the map aims to be up-to-date.

Related Question