[GIS] Point in polygon algorithm for multiple polygons

algorithmgoogle mapspoint-in-polygonpostgis

I've a Google map with bunch of polygons on it.

Here is a problem I'm interested in: Given a lat,lng point, what's the best way to determine all the polygons that this point lies in?

The obvious way is to run a "point in polygon" algorithm iteratively for each polygon, but I was wondering if there an efficient algorithm to answer such queries especially if you have thousands of polygons.

Best Answer

As with almost all such questions, the optimal approach depends on the "use cases" and how the features are represented. The use cases are typically distinguished by (a) whether there are many or few objects in each layer and (b) whether either (or both) layers allow for precomputing some data structures; that is, whether one or both of them is sufficiently static and unchanging to make the investment in precomputation worthwhile.

In the present case, this yields the following scenarios. Normally the points are dynamic: that is, they are not given beforehand. (If they are available in advance, or in very large groups, some optimizations based on sorting them will be available.) Let Q be the number of query points and P be the number of polygon vertices.

Vector polygon data

(1) Few points, few polygon vertices in toto. Use a brute-force procedure, such as the classic line-stabbing algorithm. For any decent method, the cost is O(P*Q), because it costs O(1) time to compare a point to a polygon edge and all such comparisons have to be made.

(2) Possibly many polygon vertices, but they are dynamic: each time a point is used in the query, the polygons might all have changed. Again use a brute-force algorithm. The cost is still O(P*Q), which will be large because P will be large, but there's no helping that. If the changes are small or controlled (e.g., the polygons are slightly changing shape or simply moving around slowly) you might be able to use a version of the next solution and find an efficient way to update the data structures as the polygons change. That would likely be a matter for original research.

(3) Many polygon vertices and static polygons (that is, the polygon layer will rarely change). Precompute a data structure to support the search (which could be based on a line sweep or a quadtree algorithm). The cost of precomputation for these algorithms is O(P*log(P)), but the cost of the queries becomes O(Q*log(P)), so the total cost is O((P+Q)*log(P)).

Some improvements are available in special cases, such as

(a) All polygons are convex (preprocessing the polygons can be done more quickly),

(b) All polygon interiors are disjoint, in which case you can think of their union as being a single polygon (which allows for straightforward efficient algorithms, such as those based on triangulation, and

(c) Most polygons are not very tortuous--that is, they occupy large portions of their bounding boxes--in which case you can do an initial test based on the bounding boxes only and then refine that solution. This is a popular optimization.

(d) The number of points is large. Sorting them might improve the timing. For instance, when implementing a left-to-right line sweep point-in-polygon algorithm, you would sort the points on their first coordinate, allowing you to sweep over the points at the same time you sweep over the polygon edges. I'm not aware that such an optimization has been published. One that has been published, though, is to perform a constrained triangulation of the union of all the points and polygon vertices: once that triangulation is complete, identifying the interior points should be quick. Computational cost will scale as O(Q*log(Q) + (P+Q)*log(P+Q)).

Raster polygon data

This is incredibly easy: view the polygon layer as a binary indicator raster (1=inside a polygon, 0=outside). (This could require a lookup table to convert raster values to inside/outside indicators.) Each point probe now requires O(1) effort to index the raster cell and read its value. Total effort is O(Q).

In general

A nice hybrid solution in the case of many static vector polygons (vector case 3 above) is initially to rasterize the polygons, perhaps even with a coarse resolution, this time distinguishing any cells intersecting any part of a polygon boundary (give them a value of 2, say). Using a raster probe (cost: O(1)) typically results in a definite answer (the point is known to be inside or outside), but occasionally results in an indefinite answer (the point falls in a cell through which at least one edge passes), in which case the more expensive O(log(P)) vector query is made. This method incurs some extra storage cost for the raster, but in many cases even a small raster (one MB will allow for a 2000 by 2000 raster that stores {0,1,2,null} values) can confer huge advantages in computational time. Asymptotically, the computational effort is the same as for a vector solution, but in practice it is O(Q + P*log(P)) and possibly as low as O(Q+P) (achieved by using a very fine resolution for the raster and using brute-force methods for the very rare vector queries that have to be performed).

Related Solutions

[GIS] Determining if two polygons are overlapping in Google Maps

JSFiddle Example

I've created a JSFiddle demonstrating a solution to your problem using the JavaScript Topology Suite (JSTS) (JSTS) library.

Explaination

There are two steps to this approach. The first step converts your Google geometries into WellKnownText (WKT) geometry expressions, which is a widely supported format. The second step uses JSTS to perform a JSTS geometry.intersects() comparison of two WKT geometries.

To really understand this you'll need to have a basic understanding of WKT. Because the polygon geometries in your Google Map are not widely supported format, I immediately convert them into WKT geometries so that we can work with them in JSTS.

To do this easily, I used the Wicket library. Of course you can always home-roll your own Google-Polygon-to-WKT method, or you're welcome to use one I wrote once upon a time, or you can use some other solution you might find. Personally, these days I just use Wicket, which as you can see, is wicked-simple:

// Pass in two Google Polygon objects.
// It returns two WellKnownText (WKT) geometry expressions.
// 
function UseWicketToGoFromGooglePolysToWKT( poly1, poly2 )
{
    var wicket = new Wkt.Wkt();

    wicket.fromObject(poly1);
    var wkt1 = wicket.write();

    wicket.fromObject(poly2);
    var wkt2 = wicket.write();

    return [wkt1, wkt2];
}

Next is the meat and potatoes--using JSTS to take two WKT geometries and test whether or not they intersect. Once again, relying on the library, there is not much to it:

// Pass in two WKT geometry expressions.
// It performs a JSTS intersects() comparison.
// 
function UseJstsToTestForIntersection( wkt1, wkt2 )
{
    // Instantiate JSTS WKTReader and get two JSTS geometry objects
    var wktReader = new jsts.io.WKTReader();
    var geom1 = wktReader.read(wkt1);
    var geom2 = wktReader.read(wkt2);

    if (geom2.intersects(geom1)) {
      alert('intersection confirmed!');
    } else {
      alert('..no intersection.');
    }
}

How I linked the libraries in the Fiddle

The fiddle linked above, and the solution I demonstrated requires adding two 3rd party libraries to your project--JSTS, and Wicket. Getting the code from their respective Githubs and incorporating it into your project is a different exercise. But for the fiddle, I linked to those libraries by referencing them in an existing JSTS example I found posted by Christopher Manning, as well as Wicket's own demo page. Basically I opened the pages, selected "View Source", and plucked relevant references to the two libraries. These were exact library endpoints I used:

http://arthur-e.github.io/Wicket/wicket.js
http://arthur-e.github.io/Wicket/wicket-gmap3.js
http://bl.ocks.org/christophermanning/raw/4450188/javascript.util.min.js
http://bl.ocks.org/christophermanning/raw/4450188/jsts.min.js

[GIS] Locating which geographic polygons intersect a point

Spatial databases have spatial data types, so when you use e.g. a PostGIS ( http://postgis.net/ ) /MS SQL Server ( https://msdn.microsoft.com/library/bb933790.aspx ) /... database for storage you can offload the spatial queries to the database and say (pseudocode):

select * from properties as p, schools as s where p.GEOM.STWithin(s.GEOM)

assuming your schools and properties have their geometries stored in a geometry / spatial column called GEOM.

The real advantage is that you can easily build spatial indexes that will speed upthose queries, even if you have lots of spatially distributed geometries.