I already know (How to stop writeOGR from abbreviating Field Names when using “ESRI Shapefile” driver) that writeOGR is abbreviating names of columns longer than 10 chars. But there is one more thing. If some two column names are same in case-insensitive way, it will remove data from both/one of them!
library(rgeos)
library(rgdal)
library(sp)
line.data.df <- data.frame(a = 1, b = 2, Aa = 3, bB=4, aa = "hello", bb = "bug" , Depth.Stratum.max = 6, Depth.Stratum.min =3)
l <- rgeos::readWKT("LINESTRING(0 7,1 6,2 1,3 4,4 1,5 7,6 6,7 4,8 6,9 4)")
l.df <- SpatialLinesDataFrame(sl = l, data = line.data.df)
writeOGR(obj =l.df, dsn = ".", layer = "test_abbreviation", driver="ESRI Shapefile", overwrite_layer = T)
l.shp <- readOGR(dsn = ".", layer = "test_abbreviation")
l.shp@data
With output
> library(rgeos)
rgeos version: 0.3-23, (SVN revision 546)
GEOS runtime version: 3.5.1-CAPI-1.9.1 r4246
Linking to sp version: 1.2-5
Polygon checking: TRUE
> library(rgdal)
Loading required package: sp
rgdal: version: 1.2-8, (SVN revision 663)
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
Path to GDAL shared files: /usr/share/gdal/2.1
Loaded PROJ.4 runtime: Rel. 4.9.2, 08 September 2015, [PJ_VERSION: 492]
Path to PROJ.4 shared files: (autodetected)
Linking to sp version: 1.2-5
> library(sp)
> line.data.df <- data.frame(a = 1, b = 2, Aa = 3, bB=4, aa = "hello", bb = "bug" , Depth.Stratum.max = 6, Depth.Stratum.min =3)
> l <- rgeos::readWKT("LINESTRING(0 7,1 6,2 1,3 4,4 1,5 7,6 6,7 4,8 6,9 4)")
> l.df <- SpatialLinesDataFrame(sl = l, data = line.data.df)
> writeOGR(obj =l.df, dsn = ".", layer = "temp.samplings", driver="ESRI Shapefile", overwrite_layer = T)
Warning message:
In writeOGR(obj = l.df, dsn = ".", layer = "temp.samplings", driver = "ESRI Shapefile", :
Field names abbreviated for ESRI Shapefile driver
> l.shp <- readOGR(dsn = ".", layer = "temp.samplings")
OGR data source with driver: ESRI Shapefile
Source: ".", layer: "temp.samplings"
with 1 features
It has 8 fields
> l.shp@data
a b Aa bB aa_1 bb_1 Dpth_Strtm Dpth_Str_1
0 1 2 0 0 <NA> <NA> NA NA
You can see that strange and annoying abbreviation + case-insensitivity caused columns (especially my Depth.Stratum.min and Depth.Stratum.max) to have colliding names resulting in data loss.
Weird:
-
some letters in names are kept in UPPER case.
-
in case of columns (aa and aA) – data from one of them was preserved, in case of my columns Depth.Stratum.min and Depth.Stratum.max none contained original data
Is this behavior a bug or it's just something I have to live with?
Best Answer
It is not clear from the shapefile specification https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf which .dbf version is used for the attributes but it may be dBase IV version. The older versions of dbf have strict limits for the field names. This http://www.okstate.edu/sas/v8/sashtml/accpc/z0214453.htm may not be the most official document but it makes clear that naming conventions have been changing even by the operating system
How the GDAL shapefile driver is renaming the fields is documented in http://www.gdal.org/drv_shapefile.html. Your case feels like a bug because GDAL tries to create unique names like in the example
However, why to rely on the automatic renaming when you can take the full control and rename the fields yourself in the source data?
I made a test with ogr2ogr and it seems that it does better job than your code.
Create a csv file
Check with ogrinfo
Convert into shapefile (dbf)
Check the result