[GIS] Query to Dissolve Nearby Polygons on Hive

dissolvehadoophivepostgisspatial-framework-hadoop

I am using the ESRI spatial framework for Hadoop, that extends Hive to use spatial types and operations.

My objective is to translate a set of simple queries on PostGIS into Hadoop, in order to reach horizontal scalability.

I have a grid with a count for each cell.

enter image description here

The objective of my query is to select all cells that have a count higher than a certain threshold, and group(merge) all cells that are together. For instance in this case, I would end up with something like this: 4 polygons.

enter image description here

To do this in PostGIS, I use a combination of ST_Dump and ST_SnapToGrid

CREATE TABLE exploded AS
SELECT
(ST_Dump(st_union)).geom
FROM  (SELECT ST_Union(ST_SnapToGrid(geom,0.0001)) 
 FROM grid where ptcnt > 'threshold) as q;

Unfortunately, none of these functions is available on ESRI's spatial framework.

I can perform the threshold filter, but I have no way of aggregating the nearby geometries based on the proximity (a trick perform by the grid):

create table exploded as select u as geom from (select geom as u from grid_cnt where ptcnt > 11467) as q; 

enter image description here

Does anybody can think of a workaround (perhaps using Union)?

Best Answer

The ST_Bin and ST_BinEnvelope functions (added in 2014) may help in place of ST_SnapToGrid. There is an example in step 4 of this tutorial.

The ST_Aggr_Union function may also be useful. There is an example in the blog post announcing the aggregate functions.

(Disclosure: I am a collaborator on the GIS Tools for Hadoop at Esri.)

Related Question