[GIS] How to speed up this simple MySQL points in the box query

coordinate systemMySQLoptimizationspatial-database

MySQL doesn't have nearest neighbor search so. So I try to come up with a "distance" that's reasonable so the number of returned query is around 20-200.

That way innodb doesn't have to compute distance for like 30k rows if we only need 20. The latitude and longitude is also stored in a myisam table called tableauxiliary

Well, even after that, queries still take 13 seconds and sometimes 30 seconds.

SELECT TB.ID ,
  TB.Latitude,
  TB.Longitude,
  111151.29341326*SQRT(pow(-6.185-TB.Latitude,2)+pow(106.773-TB.Longitude,2)*0.98839228980165) AS Distance
FROM
  `tablebusiness` AS TB
  join tableauxiliary as TA on TA.BusinessID=TB.ID
WHERE
  MBRContains(
        GeomFromText (
            'MULTIPOINT(-6.1928749092968 106.7651250907,-6.1771250907032 106.7808749093)'
            ),
            TA.Latlong) 
  AND MATCH (TA.FullTextSearch) AGAINST ('re*' IN BOOLEAN MODE) 
AND TA.Prominent >15 
ORDER BY
  Distance

Number of matches are 32 by the way.

Showing rows 0 - 29 ( 32 total, Query took 16.1845 sec)

Also an even simpler query where we do not compute distance also take 16 seconds

SELECT TB.ID
FROM
  `tablebusiness` AS TB
  join tableauxiliary as TA on TA.BusinessID=TB.ID
WHERE
  MBRContains(
        GeomFromText (
            'MULTIPOINT(-6.1928749092968 106.7651250907,-6.1771250907032 106.7808749093)'
            ),
            TA.Latlong) 
  AND MATCH (TA.FullTextSearch) AGAINST ('res*' IN BOOLEAN MODE)    
AND TA.Prominent >15 

I can understand if mysql cannot do nearest neighbor search. What I am doing now is a simple search points in a box here. What am I missing?

The explanation for the query is:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  TA  fulltext    PRIMARY,Prominent,LatLong,LatLong_2,FullTextSearch  FullTextSearch  0       1   Using where
1   SIMPLE  TB  eq_ref  PRIMARY     PRIMARY     752     gensurv_isikota.TA.BusinessID   1   Using index

It basically says only one row is computed for each of the process.

number of data is 1.6 million, but number of data in a box is only 30

Best Answer

You can export out the data and import into a MyISAM MYSQL this way you will then have spatial indexing which will speed up the entire process by many times. PostGIS would be even faster.

I don't think your queries are wrong - just needs to be a spatial table with index.

Types of Spatial dependent on the database software used:

Spatial indices are used by spatial databases (databases which store information related to objects in space) to optimize spatial queries. Indexes used by non-spatial databases cannot effectively handle features such as how far two points differ and whether points fall within a spatial area of interest. Common spatial index methods include:

Grid (spatial index)
Z-order (curve)
Quadtree
Octree
UB-tree
R-tree: Typically the preferred method for indexing spatial data. Objects (shapes, lines and points) are grouped using the minimum bounding rectangle (MBR). Objects are added to an MBR within the index that will lead to the smallest increase in its size.
R+ tree
R* tree
Hilbert R-tree
X-tree
kd-tree
m-tree - an m-tree index can be used for the efficient resolution of similarity queries on complex objects as compared using an arbitrary metric.

Source: http://en.wikipedia.org/wiki/Spatial_database#Spatial_index