[GIS] How to process GeoJSON data and create hive table for spatial analysis

gis-tools-hadoophadoophivepostgisspatial-analyst

I am using the ESRI-GIS tool for hadoop for analyzing json data generated through ESRI products like ArcGIS. But I wanted to analyze spatial data from a PostGIS table for which I made geojson formatted data from QGIS. When I tried to create hive table using this data and using the ESRI GIS tool for hadoop, the data is not getting loaded.

I tried different ways but failed. Are there any other ways that I can follow and proceed?

Best Answer

Maybe you could detail a bit more the attempts you made, in order to understand what happened.

In any case, from my experience there is not a smooth pipeline to import things from PostGIS (or any OGC standard) to Hadoop GIS, which is a shame... ESRI has ticked the box of Free and Open Source Software by providing this tool (which by the way, it is one of the few that I know that adds spatial support to BigData), but has failed on the interoperability. AFAIK, the support they give for loading spatial types covers their proprietary JSON format, which is not really convenient unless you are using ArcGIS... What I have done up to now, was exporting the data from PostGIS as either WKT or GeoJSON, and then load it into hive using the csv deserializer.

CREATE EXTERNAL TABLE grid_10 (id bigint, json STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION 's3://some-url/data/grids/geojson';

This means the geometry is actually stored as text, but you can easily instantiated on the go for your queries, using the ST_GeomFromGeoJSON or ST_GeomFromText functions. For instance:

create table grid_cnt as 
SELECT grid_10.id, count(grid_10.id) as ptcnt FROM grid_10 JOIN tweets     WHERE ST_Contains(ST_GeomFromGeoJSON(grid_10.json),ST_Point(tweets.longitude,   tweets.latitude))=true GROUP BY grid_10.id;

What I actually failed to do, and I suspect its a bug, was instantiating a spatial table with a geometric field, using this conversion. You can read more about it on my post.