[GIS] OpenStreetMap – Determining number of nodes, ways, and relations in a PBF file

openstreetmaposm2pgsql

I'm currently importing a subsection of North America into PostGIS to render my own basemap tiles using osm2pgsql. Is there any way to get an accurate count of the number of nodes, ways, and relations contained within my PBF file that I am generating to determine how long this will take to process on my computer?

Best Answer

You can run osm2pgsql with null output, no RAM cache, and no slim cache. For a 2.5GB PBF extract of Canada, it took me about 37 seconds to count the nodes, on an NVMe SSD.

$ osm2pgsql --output null --cache 0 canada-latest.osm.pbf
osm2pgsql version 0.96.0 (64 bit id space)

WARNING: ram cache is disabled. This will likely slow down processing a lot.

Using projection SRS 3857 (Spherical Mercator)
Allocating memory for dense node cache
Allocating dense node cache in one big chunk
Allocating memory for sparse node cache
Sharing dense sparse
Node-cache: cache=0MB, maxblocks=0*65536, allocation method=3

Reading in file: canada-latest.osm.pbf
Using PBF parser.
Processing: Node(327502k 32750.3k/s) Way(18801k 895.29k/s) Relation(0 0.00/s)  parse time: 31s
Node stats: total(327502800), max(6338602157) in 10s
Way stats: total(19016827), max(676910855) in 21s
Relation stats: total(321360), max(9397389) in 0s

Going over pending ways...
        0 ways are pending

Using 12 helper-processes
Finished processing 0 ways in 0 s


Going over pending relations...
        0 relations are pending

Using 12 helper-processes
Finished processing 0 relations in 0 s

node cache: stored: 0(0.00%), storage efficiency: nan% (dense blocks: 0, sparse nodes: 0), hit rate: nan%

Osm2pgsql took 37s overall

Note that there are two numbers, the total for this file, and the max for the OSM database.

The node count can then be multiplied by 8 bytes to get a good estimate for the cache size when you import into PostgreSQL. In this case, 327502800 * 8 bytes = 2.44 GB. The osm2pgsql manual recommends adding 10 to 30% overhead on top of that cache value.

Related Question