[GIS] Optimisation of MySQL JOIN based on a Within match

I have a table of points and boundaries, and am trying to add the boundary ID which a point is within. However, using a LEFT JOIN ON Within(location, boundary) it is taking about 3.5 hours match 450,000 points against 350 boundaries. Is there a way to optimise this join?

In more detail:

I have two tables in MySQL 5.6, one of which contains points, each stored as a point and the other containing boundaries, each stored as a geometry:

-- Table of locations, around 0.5 million points
CREATE TABLE locations (
  id INT(11) NOT NULL PRIMARY KEY,
  longitude float(11,6) DEFAULT NULL,
  latitude float(10,6) DEFAULT NULL,
  lonLat point NOT NULL DEFAULT '',
  boundaryId INT(11) DEFAULT NULL
) ENGINE=MyISAM;

-- Populate the lonLat field
UPDATE locations SET lonLat = POINTFROMTEXT(CONCAT('point(',longitude,' ',latitude,')')) WHERE longitude IS NOT NULL AND latitude IS NOT NULL;

-- Add spatial index on lonLat
ALTER TABLE locations ADD SPATIAL INDEX(lonLat);

-- Table of around 350 exact boundaries, some overlapping
CREATE TABLE IF NOT EXISTS boundaries (
  id INT(11) NOT NULL PRIMARY KEY,
  llgeom geometry NOT NULL
);

-- Add spatial index on boundary llgeom:
ALTER TABLE `boundaries` ADD SPATIAL(`llgeom`);

I have a query which updates the location table with the boundary ID that the point for that row is within:

UPDATE locations
LEFT JOIN boundaries ON Within(lonLat, llgeom)
SET boundaryId = boundaries.id;

Note that both lonLat and llgeom both have spatial indexes on them already.

With around 450,000 points and 350 geometries, running on MySQL 5.6, this takes around 3.5 hours. Doing a test limited to just 14 rows takes about 2.1 seconds.

If I run an EXPLAIN, this shows that no indexing is being used:

This shows that type is ALL in both cases, which is "the worst join type and usually indicates the lack of appropriate indexes on the table."

Is there some improvement I can make which will give much better performance, using indexes?

NB Using the ST_Within function (which gives true boundaries, rather than simplified bounding-box matching) for those same 14 rows takes much longer, 83 seconds:

UPDATE locations
LEFT JOIN boundaries ON ST_Within(lonLat, llgeom)
SET boundaryId = boundaries.id;

However, I have a routine called reallyWithin which has the same result but takes about 2.3 seconds. But whichever of the three functions is used (the bbox Within, the procedure reallyWithin, or the official ST_Within), this works out too slow for 450,000 points.

[GIS] Optimisation of MySQL JOIN based on a Within match

Best Answer

Related Question

Best Answer

Related Solutions

[GIS] How to join MySQL database to QGIS polygons

[GIS] Creating spatial-join which will calculate sum of values in certain field behind attribute in ArcMap

Related Question