[GIS] When to use GiST and when to use SP-GiST index

postgispostgresqlspatial-index

I'm aware of the documentation for both of the indexes, but can't decide which one to use.

PostGIS documentation on these indexes

Example:

Let's say we have table data with the POINT attribute that contains coordinates.
And let's also say we want to preview all points on the map to create a marker clustering. (Only inside of the boundaries of data view). As in the figure below.

enter image description here


This data is not balanced I suppose? what does that mean?

So why to use GiST or SP-GiST?

Best Answer

An exact answer is not possible without testing on your actual data, as this is highly dependent on the data size, structure, corresponding PostgreSQL internals, and the exact queries you intent to use the index for.

Generally speaking, it is perfectly fine to assume a GIST index is what you want.

Some basic considerations:

  • GIST has full operator support, including (k)NN searches
    SP-GIST doesn't support (k)NN as of yet, and supports fewer operators (which is probably not a real issue, though)
  • GIST isn't overly sensitive to the spatial distribution (homogeneous/consistently spaced vs heterogeneous/blobs of geometries) and the topology (many overlaps vs isolated distribution) of your geometries
    SP_GIST is most effective for non-overlapping geometries, and boost searches for spatially homogeneous distributions, due to its Spatial Partitioning
  • GIST creation time is rising slightly non-linear with the amount of data it has to ingest, but has a an overall stable increase (ballpark figure: 20 minutes for 100 million rows (points; global distribution))
    SP_GIST is likely faster for smaller amount of data, but tends to have a significant performance drop after a few hundred million geometries compared to GIST
  • GIST indexes have a non-trivial storage impact (ballpark figure: 5GB for 100 million geometries), but only BRIN indexes really make a difference here
    SP-GIST has a few percentages less space requirement

However:

Since it seems you are having homogeneously distributed POINTs, you could definitely try the SP_GIST index and see if you get more performance out of it; this is still dependent on other factors that are linked to PG internals, relation and result statistics, though.

But it will likely be slower if all you filter for is all points within a large bbox, as this is better covered with GIST.


Further reading:

  • a presentation of an old university colleague; his collection of slides, on different topics, are extremely insightful!
  • a direct link from said presentation with more details to SP_GIST; all other index types are covered, too
Related Question