Shapefile File Formats – Difference Between SHX and SHP Files

file formatsqgisshapefile

I am searching for a more in depth explanation of the differences in shp and shx files to little avail. I mean beyond the 'shp contain geometry – shx contains an index of the geometry'.

The reason I ask, is because while working in QGIS recently, I have made two observations that led to questions in my mind around the exact differences in these file extentions:

  1. QGIS can open and display both shx and shp, and each file appears to be more or less identical in their output (display),
  2. but not exactly so – I have noticed that sometimes the matching shx/shp files display slightly 'off-kilter' relative to each other. It doesn't appear to be a projection issue, they simply don't draw in the exact same location as each other.

These observations made me curious as to why these differences in display exist, and why QGIS can open and operate the shx in the same manner as the shp, when previously my understanding was that the shp is the 'master' file if you will, but requires .dbf and .shx to function correctly as a single, whole entity.

Best Answer

The definitive reference on the shapefile format is the ESRI Shapefile Technical Description.

It is misleading to describe the shx as being an "index." Instead, it is the direct access offset file. There is no data in the shx, only a clone of the first hundred bytes of the shp (with the length block in bytes 24-27 sized for the shx length) followed by record number and offset to the starting byte of that record in the shp. The only location for attributes is the dbf (which is standalone -- despite "knowledge" to the contrary, the shx does not tie the shp and dbf, only record number does that).

It is possible for shapefiles to have "gaps" in the shp which make the shx indispensible, but in practice Esri tools will rewrite the entire shp and shx so that any gap created by editing records is removed. Under most conditions, it is possible to recover the shx contents if it goes missing; the same cannot be said for the shp or dbf.

The naming of shp and shx is an artifact of the VFILE variable width direct access module of the PrImeOS operating system, first ported by Esri to Unix, VAX/VMS, Data General, and IBM, then to Microsoft Windows. The sbn/sbx spatial index pair shares the same naming convention (though these are not documented within the shapefile specification). Within the original VFILE FORTRAN library, only the base file was named, and the offset file with an x terminal character just appeared at file creation.