Shapefile – rgdal::writeOGR Collision in Column Names Results in NULL Data

exportrshapefile

I already know (How to stop writeOGR from abbreviating Field Names when using “ESRI Shapefile” driver) that writeOGR is abbreviating names of columns longer than 10 chars. But there is one more thing. If some two column names are same in case-insensitive way, it will remove data from both/one of them!

library(rgeos)
library(rgdal)
library(sp)
line.data.df <- data.frame(a = 1, b = 2, Aa = 3, bB=4, aa = "hello", bb = "bug" , Depth.Stratum.max = 6, Depth.Stratum.min =3)
l <- rgeos::readWKT("LINESTRING(0 7,1 6,2 1,3 4,4 1,5 7,6 6,7 4,8 6,9 4)")
l.df <- SpatialLinesDataFrame(sl = l, data = line.data.df)
writeOGR(obj =l.df, dsn = ".",  layer = "test_abbreviation", driver="ESRI Shapefile", overwrite_layer = T)
l.shp <- readOGR(dsn = ".", layer = "test_abbreviation")
l.shp@data

With output

> library(rgeos)
rgeos version: 0.3-23, (SVN revision 546)
 GEOS runtime version: 3.5.1-CAPI-1.9.1 r4246 
 Linking to sp version: 1.2-5 
 Polygon checking: TRUE 

> library(rgdal)
Loading required package: sp
rgdal: version: 1.2-8, (SVN revision 663)
 Geospatial Data Abstraction Library extensions to R successfully loaded
 Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
 Path to GDAL shared files: /usr/share/gdal/2.1
 Loaded PROJ.4 runtime: Rel. 4.9.2, 08 September 2015, [PJ_VERSION: 492]
 Path to PROJ.4 shared files: (autodetected)
 Linking to sp version: 1.2-5 
> library(sp)
> line.data.df <- data.frame(a = 1, b = 2, Aa = 3, bB=4, aa = "hello", bb = "bug" , Depth.Stratum.max = 6, Depth.Stratum.min =3)
> l <- rgeos::readWKT("LINESTRING(0 7,1 6,2 1,3 4,4 1,5 7,6 6,7 4,8 6,9 4)")
> l.df <- SpatialLinesDataFrame(sl = l, data = line.data.df)
> writeOGR(obj =l.df, dsn = ".",  layer = "temp.samplings", driver="ESRI Shapefile", overwrite_layer = T)
Warning message:
In writeOGR(obj = l.df, dsn = ".", layer = "temp.samplings", driver = "ESRI Shapefile",  :
  Field names abbreviated for ESRI Shapefile driver
> l.shp <- readOGR(dsn = ".", layer = "temp.samplings")
OGR data source with driver: ESRI Shapefile 
Source: ".", layer: "temp.samplings"
with 1 features
It has 8 fields
> l.shp@data
  a b Aa bB aa_1 bb_1 Dpth_Strtm Dpth_Str_1
0 1 2  0  0 <NA> <NA>         NA         NA

You can see that strange and annoying abbreviation + case-insensitivity caused columns (especially my Depth.Stratum.min and Depth.Stratum.max) to have colliding names resulting in data loss.
Weird:

  1. some letters in names are kept in UPPER case.

  2. in case of columns (aa and aA) – data from one of them was preserved, in case of my columns Depth.Stratum.min and Depth.Stratum.max none contained original data

Is this behavior a bug or it's just something I have to live with?

Best Answer

It is not clear from the shapefile specification https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf which .dbf version is used for the attributes but it may be dBase IV version. The older versions of dbf have strict limits for the field names. This http://www.okstate.edu/sas/v8/sashtml/accpc/z0214453.htm may not be the most official document but it makes clear that naming conventions have been changing even by the operating system

DBF File Naming Conventions

Filenames must also follow operating-system specific conventions, so check the documentation that comes with your dBASE product or other software products for further information. The following conventions apply to DBF filenames and field names:

Under Windows 95, Windows 98, Windows NT, and OS/2, the ACCESS and DBLOAD procedures support long names that are specified in the PATH= statement (such as path='c:\sasdemo\library\customer99.dbf';) However, some applications that support dBASE files might not accept files with long names.

Filenames or field names start with a letter, and they can contain any combination of the letters A through Z, the digits 0 through 9, the colon (:) (in dBASE II field names only), and the underscore (_).

Database field names can be from one to ten characters long. Each field in a DBF file has a unique name.

Filenames or field names are not case sensitive; that is, CUSTOMER is the same as Customer. Field names typed in lowercase are changed to uppercase on the display.

How the GDAL shapefile driver is renaming the fields is documented in http://www.gdal.org/drv_shapefile.html. Your case feels like a bug because GDAL tries to create unique names like in the example

a → a, a → a_1, A → A_2;

However, why to rely on the automatic renaming when you can take the full control and rename the fields yourself in the source data?

I made a test with ogr2ogr and it seems that it does better job than your code.

Create a csv file

a,b,Aa,bB,aa,bb,Depth.Stratum.max,Depth.Stratum.min
1,2,3,4,"hello","bug",6,3

Check with ogrinfo

ogrinfo fieldnametest.csv -al
INFO: Open of `fieldnametest.csv'
      using driver `CSV' successful.

Layer name: fieldnametest
Geometry: None
Feature Count: 1
Layer SRS WKT:
(unknown)
a: String (0.0)
b: String (0.0)
Aa: String (0.0)
bB: String (0.0)
aa: String (0.0)
bb: String (0.0)
Depth.Stratum.max: String (0.0)
Depth.Stratum.min: String (0.0)
OGRFeature(fieldnametest):1
  a (String) = 1
  b (String) = 2
  Aa (String) = hello
  bB (String) = bug
  Depth.Stratum.max (String) = 6
  Depth.Stratum.min (String) = 3

Convert into shapefile (dbf)

ogr2ogr -f "ESRI Shapefile" fieldnametest.shp fieldnametest.csv
Warning 1: Field 'aa' already exists. Renaming it as 'aa2'
Warning 1: Field 'bb' already exists. Renaming it as 'bb2'
Warning 6: Normalized/laundered field name: 'Depth.Stratum.max' to 'Depth.Stra'
Warning 6: Normalized/laundered field name: 'Depth.Stratum.min' to 'Depth.St_1'

Check the result

ogrinfo fieldnametest.dbf -al
INFO: Open of `fieldnametest.dbf'
      using driver `ESRI Shapefile' successful.

Layer name: fieldnametest
Metadata:
  DBF_DATE_LAST_UPDATE=2017-09-12
Geometry: None
Feature Count: 1
Layer SRS WKT:
(unknown)
a: String (80.0)
b: String (80.0)
Aa: String (80.0)
bB: String (80.0)
aa2: String (80.0)
bb2: String (80.0)
Depth.Stra: String (80.0)
Depth.St_1: String (80.0)
OGRFeature(fieldnametest):0
  a (String) = 1
  b (String) = 2
  Aa (String) = 3
  bB (String) = 4
  aa2 (String) = hello
  bb2 (String) = bug
  Depth.Stra (String) = 6
  Depth.St_1 (String) = 3
Related Question