Shapefile – Why Does ESRI Make Me Store My DBF Doubles as Strings?

dbfshapefile

As far as I can tell I have to store all numeric values in a shapefile's associated DBF file as strings (yes, the N type is essentially a string).
Can anyone confirm that ESRI does not support the use of Double (O) nor Long (I) data types in the .DBF file?

I'm using this DBF reference

Based on the response thus far I apparently did not provide enough information. My question is not one of precision but rather one of optimization. When manually setting a dbf field to type O (for double) in the dbf file, I receive this message when trying to load the standalone table into ArcMap.
Field type fail

Storing the number 1.23456789123434 as type N (defined in linked document as: "Number stored as a string, right justified, and padded with blanks to the width of the field.") costs me 15 bytes while storing the same number as type O (double) would cost me only 8 bytes. Please correct me if I am wrong on this.

Best Answer

This is not an ESRI issue, because the specification of the DBF structure antedates ESRI's use (in shapefiles) by more than a decade.

I will discuss the standard dBase III specification, because that is essentially what shapefiles use. (The link in the question is to a much more recent extension of the format, "dBase 7," which is by no means universal or standard and is unlikely to be supported by much software.)

You are correct that internally the data are stored as ASCII character strings. DBF III supports just four types: character, date, logical, and numeric. (I am omitting discussion of a "memo" field because it is not supported in shapefiles and requires an auxiliary binary file.) A "numeric" field type indicates that the ASCII string should be interpreted essentially as if it were typed in (in base 10). This differs from float and double formats in which values are stored in a binary format consisting of a base two "mantissa," a base two exponent, and a sign bit.

Another aspect of the DBF III format is that all contents of the data records are interpreted as ASCII character strings. The presence of certain characters (such as an end-of-file marker) can cause some software to terminate reading of the file. Therefore--although it may be a tempting idea--it would be unwise to try to circumvent the DBF standard by storing binary formatted data within the DBF records, because sooner or later such "control characters" like EOF will occur.

To make the most of limited file space, it is worth considering what the inherent precision of the data will be. For instance, a number like "12.3456789123456789," if it represents (say) an elevation in meters, likely contains a lot of meaningless "noise" at the end and could safely be rounded to 12.346, 12.35, 12.3, or even 12, depending on the precision and intended application. When creating the DBF file you need to provide for enough characters to store the longest value anticipated, to sufficient precision to meet all foreseeable needs. For instance--returning to the elevation example--if you need to accommodate most ground elevations to two decimal places in meters, the ASCII patterns will range from -xxx.xx (which can represent elevations down to -999.99 meters) through xxxx.xx (i.e., up to 9,999.99 meters). Consequently you would need at least seven characters (even the decimal point counts as one!). You would declare this field to be numeric of width 7 with 2 decimal places. By applying such thinking, you might be able to squeeze some bytes out of the file and stay under the 2 GB size limit imposed by Arc* software. (The DBF III standard itself uses unsigned offsets and can thereby accommodate 4 GB, as I recall.)

Best Answer

Related Solutions

[GIS] In an ESRI Shapefile, what is the “Record Number” field of each record header in the .shp file

[GIS] Why does QGIS generate Shapefiles without DBF

Related Question