GDAL OGR2OGR – Purpose of -NLN Tag in Merging Shapefiles

bashgdalmergemergeshapesogr2ogr

The basic script in order to iterate recursively over sub-folders and merge all shapefiles into single one is:

#!/bin/bash
consolidated_file="./consolidated.shp"
for i in $(find . -name '*.shp'); do
    if [ ! -f "$consolidated_file" ]; then
        # first file - create the consolidated output file
        ogr2ogr -f "ESRI Shapefile" $consolidated_file $i
    else
        # update the output file with new file content
        ogr2ogr -f "ESRI Shapefile" -update -append $consolidated_file $i
    fi
done

Hoverer in vertaully all examples around the web I noticed that for the case where I update the output file, -nln tag is added, for example:

ogr2ogr -f "ESRI Shapefile" -update -append $consolidated_file $i -nln merged

According to the documentation it says:

Assign an alternate name to the new layer

And I noticed it creates a temporary shapefile called "merged", and in the end of the loop the file is identical to the last shapefile I merged.

I don't understand why I need this? Because I succeeded to merge successfully without this tag.

Best Answer

For GDAL there are datastores which contain layers. Some datastores, like the database ones or GML, can hold several layers but some others like shapefiles can only contain one layer.

You can test with for example GeoPackage driver what happens if you do not use the -nln switch with a datastore that can contain many layers.

ogr2ogr -f gpkg merged.gpkg a.shp
ogr2ogr -f gpkg -append -update merged.gpkg b.shp

ogrinfo merged.gpkg
INFO: Open of `merged.gpkg'
      using driver `GPKG' successful.
1: a (Polygon)
2: b (Polygon)

The shapefile driver does not necessarily need the layer name because if you give the datastore name "a.shp" the driver has logic to see a single layer, named by the basename of the shapefile. Therefore you can add data to "merged.shp" with command:

ogr2ogr -f "ESRI Shapefile" merged.shp a.shp
ogr2ogr -f "ESRI Shapefile" -append -update merged.shp b.shp

However, shapefile driver has also another logic to consider a datastore which name is given without .shp extension as a multi-layer datastore. Practically this means a directory that contains one or more shapefiles as layers. You can test what happens with a command

ogr2ogr -f "ESRI Shapefile" merged a.shp
ogr2ogr -f "ESRI Shapefile" -append -update merged b.shp

Or then you can edit your script slightly to have

consolidated_file="./consolidated"

If you want to append data with ogr2ogr it is compulsory to use the -nln switch with some drivers, including a few which don't support multiple layers. For some other drivers it is not strictly necessary, but using -nln is always safe and fortunately it is used in the examples which you have found. Otherwise we would have a bunch of questions about why merging into shapefiles is successful but merging to other formats just creates new layers.

Related Solutions

[GIS] Merging shapefiles with GeoTools

Shapefiles in GeoTools are not mutable - the only way to change the schema of the shapefile is to read it in and write it back out to a new file with the modified schema.

To merge heterogeneous shapefiles you will need to read the schema of each file you want to import and then create a new schema that contains each attribute in those schemas. Then for each file read in the features and convert them to the new schema and write it back out to the new file.

[GIS] Merging multiple feature classes in file geodatabase with ogr2ogr

To select a single feature class from a gdb, we query for all entities inside it:

-sql "select * from FEATURE_CLASS_NAME"

To generate a list of feature classes following your sample use nested FOR loops:

@echo off
for %%S in (01 02 03) do (
    for %%F in (01 02 03) do (
        echo ogr2ogr out.gdb in.gdb -sql "select * from Sec%%S_Frm%%F"
        )
     )

Emits:

ogr2ogr out.gdb in.gdb -sql "select * from Sec01_Frm01"
ogr2ogr out.gdb in.gdb -sql "select * from Sec01_Frm02"
ogr2ogr out.gdb in.gdb -sql "select * from Sec01_Frm03"
ogr2ogr out.gdb in.gdb -sql "select * from Sec02_Frm01"
...

From there add -nln OUT_LAYER_NAME to say we want it all going to one place, and -nlt multipolygon for geometry type. (NB: nlt is often automatically derived from input, but for some operations like this one geometry is omitted and you get just a table):

ogr2ogr out.gdb in.gdb -nln Form01 -nlt multipolygon -sql ...

And finally we will be updating existing data so:

ogr2ogr out.gdb in.gdb -update -append -nln ...

Useful options:

-config FGDB_BULK_LOAD YES
-skipfailures

How do I include the source filename in a field when merging hundreds of shapefiles (Windows)?

for %f in (*.shp) do (
    ogr2ogr -sql "select *, '%~nf' as s_file from %~nf" -update -append merged\output.shp %f -nln output
    )

Sources

By the way, if you still have the gdb with the "unwieldy number of feature classes" hanging around from previous attempts you might be able to just ogr2ogr -f FileGDB clean.gdb unwieldy.gdb -nln Clean_Form01 -nlt multipolygon in one go.

Code samples are in Windows batch file syntax. Thank you for this question. It prompted me to extend a related project that I've been needing to come back to.

I don't think ogrinfo can write, so that would be the origin of the lock error in the Dec 11 update.

Best Answer

Related Solutions

[GIS] Merging shapefiles with GeoTools

[GIS] Merging multiple feature classes in file geodatabase with ogr2ogr

Related Question