[GIS] Alternatives to ogr2ogr for loading large GeoJson file(s) to PostGIS

convertgeojsonpostgis

I have a 7GB GeoJson file that I would like to load into a PostGIS database. I have tried using ogr2ogr but it fails because the file is too big for ogr2ogr to load into memory and then process.

Are there any other alternatives for loading this geojson file into PostGIS?

The ogr2ogr error I get is:

ERROR 2: CPLMalloc(): Out of memory allocating -611145182 bytes. This
application has requested the Runtime to terminate it in an unusual
way. Please contact the application's support team for more
information.

Best Answer

Unfortunately JSON is, much like XML, badly suited for stream processing so almost all implementations require that the whole dataset be loaded in memory. While this is ok for small sets in your case there is no other option than breaking the dataset into smaller, manageable chunks.

Improving on Pablo's solution, here's one that does not require you to actually open and load the file into an editor and split by hand but tries to automate as much as possible the whole process.

Copy the json file onto a Unix host (linux, osx) or install cygwin tools on Windows. Then open a shell and use vim to remove first and last row from the file:

$ vim places.json

type dd to remove the first line, then SHIFT-G to move the end of the file, type dd again to remove last line. Now type :wq to save the changes. This should take just a couple of minutes at most.

Now we will harness the sheer power of unix to split the file in more manageable chunks. In the shell type:

$ split -l 10000 places.json places-chunks-

Go grab a beer. This will split the file into many smaller files, each containing 10000 lines. You can increase the number of lines, as long as you keep it small enough so that ogr2gr can manage it.

Now we are going to stick head and tail to each of the files:

$ echo '{"type":"FeatureCollection","features":[' > head
$ echo ']}' > tail
$ for f in places-chunks-* ; do cat head $f tail > $f.json && rm -f $f ; done

Go grab a snak. The first two commands simply create a header and footer file with the correct contents (just for convenience really), while the last will add header and footer to each of the chunks that we split above and remove the headerless/footerless chunk (to save space).

At this point you can hopefullyprocess the many places-chunks-*.json files with ogr2ogr:

$ for f in places-chunks-*.json ; do ogr2ogr -your-options-here $f ; done
Related Question