PostGIS – Enhancing Performance for Geohash Aggregation

elasticsearchgeohashjsonpostgispostgresql

I'm using the GeoNames (https://www.geonames.org/) dataset and want to aggregate the points in geohash of a specific precision. Beforehand I'm filtering with an bbox. So this is the query I came up with:

With bbox AS(
SELECT name, the_geom FROM geonames 
WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)
)
SELECT COUNT(name), ST_GeoHash((the_geom),2) 
FROM bbox
   GROUP BY ST_GeoHash((the_geom),2)

output looks like this:

+-------+------------+
| count | st_geohash |
+-------+------------+
| 34200 | tm         |
+-------+------------+
| 3     | up         |
+-------+------------+
| ...   | ...        |
+-------+------------+  

and this is the query plan:

    "HashAggregate  (cost=24426.50..24429.00 rows=200 width=40) (actual time=5805.214..5805.229 rows=121 loops=1)"
"  Group Key: st_geohash(bbox.the_geom, 2)"
"  CTE bbox"
"    ->  Bitmap Heap Scan on geonames  (cost=376.34..24317.79 rows=3953 width=46) (actual time=454.394..2950.692 rows=3349419 loops=1)"
"          Recheck Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"
"          Filter: _st_contains('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry, the_geom)"
"          Rows Removed by Filter: 18"
"          Heap Blocks: exact=48141"
"          ->  Bitmap Index Scan on idx_geonames_geom  (cost=0.00..375.35 rows=11858 width=0) (actual time=444.950..444.950 rows=3349437 loops=1)"
"                Index Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"
"  ->  CTE Scan on bbox  (cost=0.00..88.94 rows=3953 width=64) (actual time=454.401..5030.976 rows=3349419 loops=1)"
"Planning time: 0.492 ms"
"Execution time: 5832.977 ms"

Is there a way to to increase the performence of this query ?
I'm also testing the same thing with Elasticsearch 6.6 and there the query with the same output is a lot faster.

{
    "aggregations" : {
        "zoomed-in" : {
            "filter" : {
                "geo_bounding_box" : {
                    "location" : {
                        "top_left" : "64.05, -29.79",
                        "bottom_right" : "16.38, 90.26"
                    }
                }
            },
            "aggregations":{
                "zoom1":{
                    "geohash_grid" : {
                        "field": "location",
                        "precision": 2,
                        "size": 100000
                    }
                }
            }
        }
    }
}

Best Answer

In PostgreSQL, common table expressions are always materialized. (This will change in version 12.)

To allow more optimizations, move bbox into a view, or inline it as a subquery:

SELECT COUNT(name), ST_GeoHash((the_geom),2) 
FROM (
  SELECT name, the_geom FROM geonames 
  WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)
  ) AS bbox
GROUP BY ST_GeoHash((the_geom),2)