[GIS] Choosing UTM zone to use for large country

asiacoordinate systemqgisutmwgs84

I want to reproject OpenStreetMap roads data from the current projection (WGS 84) to UTM, since I read in another GIS SE Q&A(Getting $length in meters in QGIS?) that WGS 84 cannot be used to measure length in m or km. My end goal is to calculate the total roadway kilometres within each district, so that I can enter that variable into my regression model. The country in question is Indonesia, and as you can see from the map below, it covers UTM zones 46-54.

UTM zones map from Wikimedia

Image source: Wikimedia, http://upload.wikimedia.org/wikipedia/commons/e/ed/Utm-zones.jpg

Does it really matter which one I choose? Can I choose the one that's roughly in the middle and call it good (that would be zone 50) or should I pick the zone that is more densely populated with roads (Jakarta/Java, zone 48)?

I am only barely functionally literate in GIS. The only formal training I had was a 3-day course back in 2008. Really, the only reason I need the roadway kms variable was because one of my thesis committee members wanted me to account for it in my model…

And to confirm, once I have reprojected to the UTM CRS, I can use my GIS software's built-in length tool to directly calculate the length in metres (or divide by 1000 for km), right?

I am using QGIS.

Best Answer

You use a UTM zone when your area of interest fits completely within it or very nearly so. A UTM zone is not appropriate when your area of interest spans several zones such as in your case. A little overlap into a neighboring zone might be ok, but the further away from the zone you pick, the more distortion there will be and the more it matters. I found this page with a graphic example.

You actually want a projection designed to cover that area and minimize the appropriate distortions (shape, area, distance, direction - can't have them all minimized in one projection), as mdsumner suggests. Note that you can always take a standard projection and modify it to best suite your particular area of interest by changing the detailed settings. That does take a level of understanding to know the impact your choices have on distortions and whether or not they are acceptable. And of course the larger the area you look at, the greater the compromise you make in distortions for some areas. This Esri pdf is a good introduction to projections.

You could also get the length calculations you need on a per zone basis if you want to stick with UTM and its level of distortion/error.

  1. First you'd find a dataset that is the UTM grid in polygon form with an attribute identifying the zone.
  2. Intersect that with your roads so that each road gets an attribute defining what zone it is in. This also splits the road at the zone boundary since it's likely that you'll have roads crossing them. With a spatial join you could end up counting the same road twice.
  3. Since you also want totals per district, you'd want to next Intersect your updated roads with the district boundary layer to again split them at boundaries and get an attribute defining their district. Be cautious with intersect - if you have roads that aren't within a district, they will get dropped because intersect only looks at areas of overlap between layers and not just one or the other.
  4. Then you can add a field to hold your length calculation (you do not want to rely on the default shape_length field, since that is system tracked and tied to the projection of the data). Also note you need to do both intersects before calculating length, because your length field won't automatically update with changes. If you only did zones, then intersected with districts, each district piece of the road would have a length value for the whole zone and not just the district unless you recalculated it.
  5. Now start reprojecting (not defining, note the difference) the data into each UTM zone, select only the records with that zone attribute, and field calculate their length into your field. You don't need to split it to separate layers/shapefiles to do that, just make sure you're only working with the correct records for the current projection. Once you've calculated all the lengths you can go back to WGS84, and your length attribute will remain as whatever units it was originally calculated to.
  6. Finally, in ArcGIS you'd use Summary Statistics to total road length per district, but in QGIS you need the GroupStats plugin or something similar. Or you pull the table into another software that lets you do the same district case-based sum.

As an experiment, you can always add another length field, set a different projection that covers more than just a UTM zone, such as EPSG:3001 I mentioned earlier in a comment, field calculate that length, and compare it to your UTM lengths to see what kind of difference it makes.