[GIS] Why is PostGIS query not using spatial index

geodjangopostgispostgresql

I am having an issue with a spatial query that is not making use of indexes.

My schema is:

CREATE TABLE geoplanet_place
(
  woeid integer NOT NULL,
  "name" character varying(300) NOT NULL,
  admin_1 character varying(300),
  coords geometry NOT NULL,
  bbox geometry NOT NULL,
  CONSTRAINT geoplanet_place_pkey PRIMARY KEY (woeid),
  CONSTRAINT enforce_dims_bbox CHECK (st_ndims(bbox) = 2),
  CONSTRAINT enforce_dims_coords CHECK (st_ndims(coords) = 2),
  CONSTRAINT enforce_geotype_bbox CHECK (geometrytype(bbox) = 'POLYGON'::text OR bbox IS NULL),
  CONSTRAINT enforce_geotype_coords CHECK (geometrytype(coords) = 'POINT'::text OR coords IS NULL),
  CONSTRAINT enforce_srid_bbox CHECK (st_srid(bbox) = 4326),
  CONSTRAINT enforce_srid_coords CHECK (st_srid(coords) = 4326)
)
WITH (
  OIDS=FALSE
);

My indexes are:

CREATE INDEX geoplanet_place_bbox_id
  ON geoplanet_place
  USING gist
  (bbox);

CREATE INDEX geoplanet_place_coords_id
  ON geoplanet_place
  USING gist
  (coords);

Query A:

SELECT "geoplanet_place"."woeid", "geoplanet_place"."name", "geoplanet_place"."admin_1"
FROM "geoplanet_place"
WHERE "geoplanet_place"."bbox" && ST_Expand(ST_GeomFromText('POINT (-71.4119869999999963 41.8238720000000015)', 4326), 16093.44)
    AND ST_distance_sphere("geoplanet_place"."coords", ST_GeomFromText('POINT (-71.4119869999999963 41.8238720000000015)', 4326)) <= 16093.44
ORDER BY "geoplanet_place"."woeid" ASC
LIMIT 5;

Running EXPLAIN ANALYZE on Query A yields:

 Limit  (cost=0.00..56.00 rows=5 width=24) (actual time=1440.482..1451.604 rows=5 loops=1)
   ->  Index Scan using geoplanet_place_pkey on geoplanet_place  (cost=0.00..20018585.21 rows=1787420 width=24) (actual time=1440.477..1451.583 rows=5 loops=1)
         Filter: ((bbox && '0103000020E61000000100000005000000CFF6E80D6D92CFC0134548DDCE59CFC0CFF6E80D6D92CFC02BC58EC6A183CF406F13EE95034BCF402BC58EC6A183CF406F13EE95034BCF40134548DDCE59CFC0CFF6E80D6D92CFC0134548DDCE59CFC0'::geometry) AND (st_distance_sphere(coords, '0101000020E610000023D8B8FE5DDA51C0890B40A374E94440'::geometry) <= 16093.44::double precision))
 Total runtime: 1451.657 ms
(4 rows)

I guess the plan is using the index because the query completes quickly, but it doesn't mention the index anywhere.


Query B is Query A rewritten to use ST_DWithin(). This function call is supposed to be equivalent to Query A's WHERE clause.
Query B:

SELECT "geoplanet_place"."woeid", "geoplanet_place"."name", "geoplanet_place"."admin_1"
FROM "geoplanet_place"
WHERE ST_DWithin("geoplanet_place"."bbox", ST_GeomFromText('POINT (-71.4119869999999963 41.8238720000000015)', 4326), 16093.44)
ORDER BY "geoplanet_place"."woeid" ASC
LIMIT 5;

Running EXPLAIN on Query B yields:

 Limit  (cost=1884535.96..1884535.97 rows=5 width=24)
   ->  Sort  (cost=1884535.96..1884535.98 rows=9 width=24)
         Sort Key: woeid
         ->  Seq Scan on geoplanet_place  (cost=0.00..1884535.82 rows=9 width=24)
               Filter: ((bbox && '0103000020E61000000100000005000000CFF6E80D6D92CFC0134548DDCE59CFC0CFF6E80D6D92CFC02BC58EC6A183CF406F13EE95034BCF402BC58EC6A183CF406F13EE95034BCF40134548DDCE59CFC0CFF6E80D6D92CFC0134548DDCE59CFC0'::geometry) AND ('0101000020E610000023D8B8FE5DDA51C0890B40A374E94440'::geometry && st_expand(bbox, 16093.44::double precision)) AND _st_dwithin(bbox, '0101000020E610000023D8B8FE5DDA51C0890B40A374E94440'::geometry, 16093.44::double precision))
(5 rows)

I didn't allow EXPLAIN ANALYZE on Query B to finish because it was taking a long time.

I ran ANALYZE to see if it would help, but nothing changed:

ANALYZE VERBOSE geoplanet_place (bbox, coords);

Questions:

  1. Why does the result of EXPLAIN on Query A not say anything about indexes?
  2. Why does Query B result in a different query plan when it should be equivalent to Query A?
  3. What must I do so that Query B will be fast?
  4. Is there a problem with my indexes? Will changing the statistics on the geometry columns change anything?

Links:

  1. Building Indexes for PostGIS: http://postgis.net/docs/using_postgis_dbmanagement.html#idm2459
  2. ST_DWithin: http://postgis.net/docs/ST_DWithin.html
  3. Statistics Used by the Planner: http://www.postgresql.org/docs/8.4/interactive/planner-stats.html

Best Answer

Your first query is using an index just not the spatial one. See the Index Scan using geoplanet_place_pkey. So it's more efficient for it to use the id key since you are doing an ORDER by the column and your spatial filter covers the whole table.

The spatial index is not used because your ST_Expand is too big. You have a geometry but its in long lat. The expand units are therefore in long lat. My god I can't fathom how big 16093.44 degrees is. that would cover your whole table and then some so therefore the index is useless and Postgres is smart enough to realize that.

If you are going to do this use geography data type and then your 16093.44 would be in meters and valid and you wouldn't need that ST_Spheriod call.

since the ST_Dwithin geograpy call would take care of everything.

e.g.


CREATE TABLE geoplanet_place
(
  woeid integer NOT NULL,
  "name" character varying(300) NOT NULL,
  admin_1 character varying(300),
  coords geography(POINT,4326) NOT NULL,
  bbox geography(POLYGON,4326) NOT NULL
)

and then instead of ST_GeomFromText use ST_GeogFromText

Related Question