[GIS] D3 for maps—at what stage to bring in data to the geo

d3topojson

I'd like to map a world choropleth for display with D3, a la:

I have a dataset I'd like to display that's keyed to ISO-alpha-3 keys. So…

danger.csv
iso,level
AFG,100
ALB,0
DZA,12

etc.

Following the instructions on topojson, I know I can do…

wget "http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/50m/cultural/ne_50m_admin_0_countries.zip"
unzip ne_50m_admin_0_countries.zip
ogr2ogr -f "GeoJSON" output_features.json ne_50m_admin_0_countries.shp -select iso_a3
topojson -o topo.json output_features.json --id-property iso_a3

to produce a worldmap json that is ID'd by ISO3.

My question is: at what point in the workflow should I merge the data from danger.csv onto the geo data? I had previously worked with qGIS as a GUI, but where /should/ the merge happen? In the .shp? After the ogr2ogr? Dynamically in the browser after the topojson shrink (like here http://bl.ocks.org/mbostock/4060606 http://bl.ocks.org/mbostock/3306362)?

I'm pretty good with python, but pretty new to javascript, and find myself copying and pasting Bostock examples more than actually being a generative coder there.

(I also have a related, but more involved follow-up on Stackoverflow that maybe I should migrate here: https://stackoverflow.com/questions/18604877/how-to-do-time-data-in-d3-maps)

Best Answer

Ask yourself two questions:

  1. Are you going to reuse the geography on multiple datasets?

    If you’ll use the same geography with multiple datasets, then it makes sense to keep the geography and data separate, and join them in the client. Many of my examples have separate CSV (or TSV) files for this reason. This way, the TopoJSON for U.S. states and counties or likewise world countries can be reused, instead of creating separate TopoJSON for every example.

    On the other hand, if you’ll only use this geography once, then you should probably “bake” the data into the geography as properties, if only to simplify the code. This approach is simpler because you only need to load a single file (so no queue.js), and since the data is stored as properties of each feature, you don’t need to join the data in the client (so no d3.map).

    Side note: TSV and CSV are often much more efficient at storing properties than GeoJSON and TopoJSON, simply because the latter must repeat property names on every object. File size can be another reason to store your data in a separate file and join it in the client.

  2. Is your data already bound to geography (e.g., a property of your shapefile)?

    Assuming you answered “no” to the first question and want to bake the data into the geography (rather than doing it in the client), how you do this depends on the format of the data.

    If your data is already a property of your shapefile, then use topojson -p to control which properties are saved to the generated TopoJSON file. You can also use this to rename properties and coerce them to numbers, as well. See Let’s Make a Map for examples.

    If your data is in a separate CSV or TSV file, then use topojson -e (in addition to -p) to specify an external properties file that can be joined to your geographic features. Cribbing the example from the wiki, if you had a TSV file like this:

    FIPS    rate
    1001    .097
    1003    .091
    1005    .134
    1007    .121
    1009    .099
    1011    .164
    1013    .167
    1015    .108
    1017    .186
    1019    .118
    1021    .099
    

    Using -e, you can map these to a numeric output property named “unemployment”:

    topojson \
      -o output.json \
      -e unemployment.tsv \
      --id-property=+FIPS \
      -p unemployment=+rate \
      -- input.shp
    

    An example of this approach is the Kentucky population choropleth, bl.ocks.org/5144735.

Related Question