I am having trouble understanding how the S2 library works so things in this question may make no sense, but I have a goal I would like to accomplish. Lets say I have many lat/lng point I want to store in a database. The database would be a simple key/value type database (e.g get("MyCoolKey")->returns->"MyCoolValue" )
. Later based of a lat/lng point I want to lookup in the database to get all the nearest points (say within 100km).
If I was doing something similar with geohashes I would convert my lat/lng point into a geohash:
48.669, -4.329 -> gbsuv
Then I would store this geohash in the database. I would do the same for many other points. Then to find nearby points I would query for all points in that geohash and all the neighbor:
gbsvh gbsvj gbsvn
gbsuu gbsuv gbsuy
gbsus gbsut gbsuw
I have the following functions I can use to accomplish this goal of mine using s2: https://github.com/mapbox/node-s2/blob/master/API.md .
From what I understand I need to convert the lat/lng points into a cellid? Then I will be able to somehow get nearby points?
Using these available functions how would I then:
- Take a lat/lng point and "convert" it
- Then based of some lat/lng point find other points in the database
Best Answer
The S2 Geometry library is complex, but for your needs you can use a partial implementation, as some Javascript port.
You can use NodeJS installing the s2-geometry package (e. g. by
npm install s2-geometry
), or this CDN link for HTML pages.Short answer
Use the
S2.latLngToKey(lat,lon,level)
to obtain key=YourCoolValue of your pointThe
key
is like a base4 Geohash or a tile-quadkey. It is a string with theface
of the S2 Cube andface_pos
, the hierarchical position (base4) of the cell in the hierarchical (Hilbert) grid. The result of this example isWhere you can see same big prefix for two near points,
2/100022000031
and only little prefix when the points are not near,2/10002
.About the distance that the commom prefix represents, check the S2 cell Statistics: level 13 seems adequate for check 1km, that is the 13-digits
Tests
I done my tests using the Javascript S2 geometry library, and starting with a sample of well-known places, obtained by this Wikidata-query:
Note: to add other objects or check more details use QID. For instance the first sample have
QID=178114
so you can check details withhttp://wikidata.org/entity/
URL: http://wikidata.org/entity/Q178114Complete Javascript code, with sample data and illustrating options for encoding:
Results:
The column
idHex
is the standard hexadecimal (base16) representation of the S2 Geometry Cell identifier (cell ID). The columnFace/Key
is an alternative representation of the same cell ID.Each cell of the S2 Geometry grid have an unique number (64 bits unsigned integer) used to identify ifself, that is a structure of bits expressing the position in the Hilbert curve (Face_pos) of a cube face.
You can use as geocode (said "MyCoolValue" in the question) any one, the
idHex
or the key=Face/Face_pos
.Cell ID and Cell Key have the same information, but ID mix the face and face_pos, and key not. Face is explicit and each digit of face_pos is a level in the grid structure.
Complete answer
The main advantage of S2 Geometry over Geohash is uniformity, the (near) constant shape and area of the S2 Geometry cells. A grid of equal-area is very important in statistics and another applications, see this Open Geospatial Consortium standard about the theme.
There are some (minor) advantages of Hilbert curve (S2) over Z-order curve (Geohash), but no one is perfect... In both, is possible two neighbors in the grid that have identifiers (keys) very different.
For application where you need 100% reliable result, use also the functions like
GetEdgeNeighbors()
of the s2-geometry package.Suggestion for base16 encoding
About convert "Face/Face_pos" to hexadecimal, is possible for example convert the base4
10002200
to hexa40a0
, but base4 with more one digit will result in entirely different (or invalid) hexadecimal; for example100022003
results in102801
... The "more one digit" ocurrs when you compare for example level 18 keys with level 19 keys.To avoid this problem, an extra digit must be added by a extend base16 algorithm.
Comparative results
The
idHex
can be reduced with the trailing 0's removed (e.g.89b7b7a1bd000000
reduced to89b7b7a1bd
). The information intokeyHex
andidHex
are same, differing only in one bit.The
keyHex
is better: represents the same cell with the same number of digits, but with a evident relationship with "Face/Face_pos" and always preserving prefix, with no distortions. The hexadecimal (base16) representation is aligned with the base4.Notice the folowing
idHex
in the table, to see the distortions:89b7b7a1bd
and level 19 is89b7b7a1bdc
, so concatenating one digitc
.89c259aa97
and level 19 is89c259aa964
, changed final7
to6
and concatenate4
.