[GIS] Error when creating spatial index in MongoDB

geojsonmongodbogr2ogrspatial-index

I am wanting to use MongoDB for some basic spatial calculations (on some fairly large data), but I get an error when trying to create an index on the US Counties shapefile retrieved from the US Census' TIGER data. I had to do some manipulation in the terminal to get this data into MongoDB in the first place since Mongo can't ingest shapefiles directly:

$ ogr2ogr -f GeoJSON counties.geojson tl_2015_us_county.shp -t_srs EPSG:4326 # covert shapefile to geojson
$ jq --compact-output ".features" counties.geojson > counties_reformatted.geojson # reformat the .geojson because mongoimport throws an error if you don't
$ mongoimport -d mydb -c counties < counties_reformatted.geojson --jsonArray --batchSize 1 --drop # import to mongodb

I then create the index in the mongo shell with

> db.counties.createIndex( { "geometry" : "2dsphere" } )

but I get this error:

.
.
.
Edges 839 and 841 cross. Edge locations in degrees: [-99.8924200, 36.5932380]-[-99.8960610, 36.5932360] and [-99.8960600, 36.5932360]-[-99.8960630, 36.5932360]",
    "code" : 16755

This is along the border of Oklahoma and Kansas.

I have been able to get this process to work with shapefiles of Mexico's states, so it has worked in the past. Where is the error coming from? The ogr2ogr conversion? Or something wrong with the shapefile in the first place?

The counties, tl_2015_us_county.shp have EPSG:4269 by default, so I suspected there may be a problem in converting to EPSG:4326, but changing t_srs to EPSG:4269 in my ogr2ogr sciprt did not fix the problem. I need (I think) the counties in EPSG:4326 because I have a much, much bigger points layer already in this system.

Notes:

Maybe Stack Overflow would be a better place for this?

Inverted lat/lon is not the problem, and this workaround does not work for me.

Update:

I also tried using two different shapefiles (with different resolutions) from the US Census here, and I get a very similar error. When plotting the points that mongo says overlap, there is not a perfect coincidence with the points of the shapefile. This leads me to think that something is going wrong in my conversion process.

I'm guessing this could be solved by using really low resolution data, but I'm not really comfortable with that.

Also, spatial queries without the index are not really an option here. They take excruciatingly long without the index.

Version info:

os: Ubuntu 14.04
mongodb: 3.2.7
ogr2ogr4: GDAL 1.10.1, released 2013/08/26
jq: 1.3

Shapefile (in case anyone wants to replicate this):

wget ftp://ftp2.census.gov/geo/tiger/TIGER2015/COUNTY/tl_2015_us_county.zip --no-parent --relative --recursive --level=2 --accept=zip --mirror

Best Answer

I didn't work through the issue with your data but my perspective is that the error comes down to a limitation of shapefile format. Topological consistency is not enforced/inherent in the shapefile format (esri's explanation here). The Mongo spatial indexing system does require topological consistency.

Lacking the enforcement of consistency in the source format there are multiple ways for the process to go wrong including inconsistent precision in the coordinates or rounding errors in the conversion.

My answer is that you might need a GIS for what you are trying to achieve. A GIS (QGIS for example) has tools for resolving topology problems. Of course if you go all the way to a GIS you could do the analysis there as well.