I am searching for a more in depth explanation of the differences in shp and shx files to little avail. I mean beyond the 'shp contain geometry – shx contains an index of the geometry'.
The reason I ask, is because while working in QGIS recently, I have made two observations that led to questions in my mind around the exact differences in these file extentions:
- QGIS can open and display both shx and shp, and each file appears to be more or less identical in their output (display),
- but not exactly so – I have noticed that sometimes the matching shx/shp files display slightly 'off-kilter' relative to each other. It doesn't appear to be a projection issue, they simply don't draw in the exact same location as each other.
These observations made me curious as to why these differences in display exist, and why QGIS can open and operate the shx in the same manner as the shp, when previously my understanding was that the shp is the 'master' file if you will, but requires .dbf and .shx to function correctly as a single, whole entity.
Best Answer
The definitive reference on the shapefile format is the ESRI Shapefile Technical Description.
It is misleading to describe the
shx
as being an "index." Instead, it is the direct access offset file. There is no data in theshx
, only a clone of the first hundred bytes of theshp
(with the length block in bytes 24-27 sized for theshx
length) followed by record number and offset to the starting byte of that record in theshp
. The only location for attributes is thedbf
(which is standalone -- despite "knowledge" to the contrary, theshx
does not tie theshp
anddbf
, only record number does that).It is possible for shapefiles to have "gaps" in the
shp
which make theshx
indispensible, but in practice Esri tools will rewrite the entireshp
andshx
so that any gap created by editing records is removed. Under most conditions, it is possible to recover theshx
contents if it goes missing; the same cannot be said for theshp
ordbf
.The naming of
shp
andshx
is an artifact of theVFILE
variable width direct access module of the PrImeOS operating system, first ported by Esri to Unix, VAX/VMS, Data General, and IBM, then to Microsoft Windows. Thesbn
/sbx
spatial index pair shares the same naming convention (though these are not documented within the shapefile specification). Within the originalVFILE
FORTRAN library, only the base file was named, and the offset file with anx
terminal character just appeared at file creation.