The short answer is, there's no outstanding way.
Options that won't work well
Hadoop
Hadoop and similar tools aren't the solution, as it's entirely possible to do this type of analysis on a reasonably powerful server. You may not have a reasonably powerful server, in which case Hadoop wouldn't be a good option since it needs a cluster.
If you happen to have a Hadoop cluster and are an expert in using it, it's reasonable, but otherwise it's more development time for no gain.
Vector tiles
Vector tiles don't remove any processing steps, they just allow some of the work to be shared by multiple styles. As you've seen with Mapbox Streets' styling, buildings aren't often in low-zoom vector tiles, so you'd have to generate them yourself.
You could stitch together low-zoom vector tiles, but you'd have to use your own rendering toolchain for that, and it would be complex.
Reasonable options
OSM has about 160 million building ways, 35 million addresss nodes, and 21 million ways with addresses. Most of the last are also buildings.
osm2pgsql
osm2pgsql can handle this on reasonable hardware, if you take care to exclude other data. To do this you want a custom .style file which includes only address and building tags. Starting with empty.style, the suggested starting point, we can get
node,way addr:unit text linear
node,way addr:housename text linear
node,way addr:housenumber text linear
node,way addr:street text linear
way building text polygon
Everything below here is copied from empty.style, with the "building" and z_order lines removed.
The former is above as an actual column, the latter isn't relevant for this use
way_area is included, as it is useful
way abandoned:aeroway text phstore
way abandoned:amenity text phstore
way abandoned:building text phstore
way abandoned:landuse text phstore
way abandoned:power text phstore
way area:highway text phstore
node,way aeroway text phstore
node,way amenity text phstore
way building:part text phstore
node,way harbour text phstore
node,way historic text phstore
node,way landuse text phstore
node,way leisure text phstore
node,way man_made text phstore
node,way military text phstore
node,way natural text phstore
node,way office text phstore
node,way place text phstore
node,way power text phstore
node,way public_transport text phstore
node,way shop text phstore
node,way sport text phstore
node,way tourism text phstore
node,way water text phstore
node,way waterway text phstore
node,way wetland text phstore
way way_area real linear # This is calculated during import
Save it as buildings.style
You can then import the planet, using something based on the suggested osm2pgsql command line for the planet osm2pgsql -c -d buildings --style /path/to/buildings.style --slim --drop -C <cache size> --flat-nodes <flat nodes> /path/to/planet-latest.osm.pbf
where
<cache size>
is 24000 on machines with 32GiB or more RAM
or about 75% of memory in MiB on machines with less
<flat nodes>
is a location where a 24GiB file can be saved.
There's a couple of different options used
--drop
gets rid of tables used only during the import and during updates, as I'm assuming you'll update by reimporting
--style /path/to/buildings.style
specifies to use the style we wrote above
This will take a day or two on a reasonably powered server.
Once the import is done, there's a couple indexes you can add which will help performance
CREATE INDEX planet_osm_polygon_area_18250_idx ON planet_osm_polygon USING gist (way) WHERE way_area > 18250;
CREATE INDEX planet_osm_polygon_area_1140_idx ON planet_osm_polygon USING gist (way) WHERE way_area > 1140;
CREATE INDEX planet_osm_polygon_area_71_idx ON planet_osm_polygon USING gist (way) WHERE way_area > 71;
When defining the layers in Kosmtik, Tilemill, or another map design studio, include the condition WHERE way_area > 0.05*!pixel_width!::real*!pixel_height!::real
on any polygon layers.
Some things to watch for when visualizing
- Rendering lots of little polygons doesn't work well. That, as well as performance, is why there's the area cutoff for layers.
- It's still going to be slow, but pre-rendering the US should be entirely reasonable
- Test on a small area first
https://switch2osm.org/loading-osm-data/ has more information on setting up PostgreSQL and installing osm2pgsql
libosmium
libosmium is a library for working with OSM data, and probably the best option for operating directly on the planet file.
QGIS or ArcGIS
If using this much data in QGIS or ArcGIS, you probably want to script it, but you could do a more sophisticated analysis.
osm2pgsql multi-backend
This is very similar to the above, but features a more flexable backend, which can result in tables better suited for exporting to another format.
Best Answer
I am not aware that you have a converter from Mapnik to Mapserver mapfile styles.
You will really need a lot of tools for what you expect. OSM world has his own set of tools because of their DB complexity and storage (XML)
I don't recommend to use Shapefile for rendering OSM data. If you need lot of custom styles, the shp don't give you all the attributes required to do like this. You can always look at this minimum mapfile from OSM website
IMHO it's better to follow all the instructions available on the official Mapserver wiki. You will need OSM2PGSQL but with it, you can update your DB regularly (worth when you see the full OSM DB size)
Instructions depend on your OS:
Ubuntu
Windows
At the end, your choice ;)