OpenStreetMap Data – Processing Raw OSM Data for OpenStreetMap.org

openstreetmap

Can anyone provide insight into how OSM data is processed or rendered for www.openstreetmap.org?

A specific example is that I extracted data from a recent planet.osm PostGIS dataset for an area in Missouri. The OSM data needs a lot of cleaning before it can be rendered using the correct styles. Many water bodies are stored as line strings that don't close properly, so I have to use FME for snapping and then polygon building so that I can have blue filled rivers / lakes.

If I look at the same data here the water bodies are rendered as expected.

I'm having trouble identifying all the cases where snapping is required (e.g. which 'Natural' types require it and what the tolerance should be). Also I suspect there are many other data issues that I will never see as I am dealing with all of North America.

Does everyone who downloads and uses OSM data go through their own cleanup process?

Does anyone know how this cleanup is handled by www.openstreetmap.org?

It seems like their process would be the best informed and most tested.

Here is more information on my workflow

A planet.osm file is downloaded and loaded into PostGIS, using Osmosis, into the pgsql schema. I then extract OSM xml from PostGIS for lots of small areas, again using Osmosis. Each of these small xml files is then converted into Shapefiles using FME and its broad feature categories. It is this stage (OSM xml -> Shp via FME) that I am expecting to convert lines into polygons and perform other cleanup on the data.

These Shapefiles are served up through GeoServer (and cached using GWC).

Best Answer

There are a few different angles to this, and since it's unclear how you're processing data initially, I guess I'll just give an overview.

There are two main ways to consume OSM data - by using osm2pgsql, an older utility that supports 'stylesheets' and differential updates, and Imposm, a newer, Python-based system that supports Python-based stylesheet transforms. When people do processing, a lot of it is in that kind of script. For instance, here's an imposm mapping for osm-bright, the stylesheet upon which MapBox Streets (disclosure/employee) is based.

To be more specific to what you're encountering, it's likely that you aren't properly processing osm relations properly, which, in the data model are what allow multiple linestrings to form polygons. Tools like Imposm and osm2pgsql generally handle this kind of data transformation for you.

As far as how OSM.org itself does things: edits are in a 'semantic' Postgres database, and continuously imported into a PostGIS database with osmosis, and rendered with Mapnik. There's no manual cleanup step between the database and map rendering, since the two are highly coupled and the map aims to be up-to-date.

Introduction

This will likely require significant amount of manually work to detect and remove the duplicated data. When you're detecting and resolving the duplicate data; you'll want both sources to be in the same geo format: shapefile, PostGIS DBs, or as OSM data.

Workflow

The following workflow is based on having both sources of data as OSM before merging and resolving duplicate data.

There's a couple options to convert the data into OSM:

Convert the shapefile data into OSM however you'd like. Versions of ogr2ogr released in 2013 or later (version 1.10 or later, IIRC) can also convert SHP to OSM. There's also ogr2osm as you had noted: there's a couple different versions of ogr2osm, no matter which one you use - I prefer pnorman's, it's the most up-to-date. No matter what, make sure the translation files are compatible with the version of ogr2osm that you're using (for the sake of simplicity, the ones that I've linked to should be compatible with the version of ogr2osm). See here as examples for translation files that are compatible with pnorman's ogr2osm.

Ensure the translation file is complete with of all the information that you want in your shapefile. The translation file will convert your Types and attributes of the shapefile into what OSM calls Tags, which consist of Keys and Values.

1a. run ogr2osm.

Open josm, download the conflation plugin
Your gov data is now a osm file. Open josm, File > open Your data is there as a layer.
If you already have the OSM data locally stored on your computer, open it in josm, it will also open as a new layer.
Merging these two sources of data together and resolving the duplicate data is known as conflation. Run the conflation plugin and resolve all of the conflicts.

If JOSM runs out of memory (e.g., when using large files), separate the types of attributes and complete this workflow multiple times, each with a different kind of data (e.g. boundaries and land uses; highways; buildings), and then finally merge the osm files together using osmium or another tool.

B. JOSM can also read shapefiles although SHP support isn't perfect and this method assumes the shapefile can be loaded entirely into memory...

Start JOSM.
Open the shapefile (e.g., filename.shp).
Select all.
In JOSM, Edit the attributes and properties that were imported from the SHP, and change them so each attribute corresponding an OSM tag.
Save as OSM format.
Continue from A4 and conflate

Import as OSM

Import the OpenStreetMap data into the system as follows:

Change to the directory containing OpenStreetMap (OSM) files converted using JOSM.
Execute the following commands in the database:

    CREATE EXTENSION hstore;
    osm2pgsql -j -W \
              -d osm filename.osm

The -j option is key as it instructions osm2pgsql to import the tags into an hstore column, this preserving the underlying data structure and will import all tags into the database.

Create Mapnik Layer

To have the data appear on the map, add a layer and a style for that layer. This can be as simple as the following:

Edit mapnik-stylesheets/osm.xml.
Insert the following XML code before the closing </Map> tag...

...

<Layer name="prefix_zone" status="on" srs="&osm2pgsql_projection;">
  <StyleName>zones</StyleName>
  <Datasource>
    <Parameter name="table">
    (select way from prefix_line order by tags desc, z_order) as zones
    </Parameter>
    &datasource-settings;
  </Datasource>
</Layer>

Create Mapnik Style

Continuing from the previous section:

Find the last </Style> tag (around line 3350).
Insert the following XML code before the &layer-shapefiles; directive:

...

<Style name="zones">
  <Rule>
    &maxscale_zoom1;
    &minscale_zoom19;
    <LineSymbolizer stroke="#0065BD" stroke-width="2.5" />
  </Rule>
</Style>

Roadmatcher

roadmatcher is another tool that might be helpful

Best Answer

Related Solutions

[GIS] Large Open Street Map OSM file to Shapefile. Is there any hope for 32-bit FME

[GIS] Conflate (merge) private shapefile data with OSM data