[GIS] Fastest way to convert shapefile to postgis in java

geotoolspostgisshapefile

I have been using geotools to read shapefiles and import them into postgis. Unfortunately, when large files are being imported, this leads to a massive lag time –8000% of shp2pgsql piped into pgsql– which I assume is caused mostly by individual inserts of data.

Assuming that the amount of time it takes to write the features to the dump
is not excessive, this should massively decimate the exorbitant amount of time required to import the shapefile.

My plan is to just convert all of the feature attributes into rows of strings and then just use the copy functionality in the jdbc to bulk load the data –but I do not know of any functions in the jdbc or elsewhere that can convert java objects into postgres dump comptable strings.

Below is a sample output from shp2pgsql –which is essentially what I want to replicate using java.

SET CLIENT_ENCODING TO UTF8;
SET STANDARD_CONFORMING_STRINGS TO ON;
BEGIN;
CREATE TABLE "raw_data"."mud" (gid serial,
"objectid" numeric(10,0),
"tcmud_name" varchar(50),
"shape_area" numeric,
"shape_len" numeric);
ALTER TABLE "raw_data"."mud" ADD PRIMARY KEY (gid);
SELECT AddGeometryColumn('raw_data','mud','geom','2277','MULTIPOLYGON',2);
COPY "raw_data"."mud" ("objectid","tcmud_name","shape_area","shape_len",geom) FROM stdin;
14  LAKEWAY MUD 9.60083055833e+004  1.24005179036e+003  0106000020E50800000100000001030000000100000005000000E03113F38C20474188DA1ADB9142634140726A6B94204741B07B4EE669426341007FCF91FE1F4741D0BE4D2668426341A03E7819F71F474190631C1B90426341E03113F38C20474188DA1ADB91426341
16  LAKEWAY MUD 1.99805337165e+004  5.90416771226e+002  0106000020E5080000010000000103000000010000000900000040CA100AE92F474110A13411B3466341E09D3C7DE42F474198B6CE82AA466341A0BA19BD912F4741F08A1B80AF466341940F2FE9912F4741E9D4FBCEB24663410005BDDD902F4741B0CC491BB646634120440ECA8C2F4741403B9018BC466341E0865D22ED2F4741D06DCE02BA466341E0AAB0DAEA2F474170988569B646634140CA100AE92F474110A13411B3466341
17  LAKEWAY MUD 2.55495765553e+004  6.75755882634e+002  0106000020E5080000010000000103000000010000000800000080183EE7062F4741A03A533557466341208801F3F52E4741E836A58E55466341805FC108D32E4741685B85F96446634140DE1C01CF2E4741D8E7CCEA6A466341A063D3DFC92E47415008A8CE71466341E0CBB53AF42E4741A848726374466341C0F32A05292F4741588B715F5A46634180183EE7062F4741A03A533557466341
20  LAKEWAY MUD 1.54327655151e+004  5.08037611343e+002  0106000020E5080000010000000103000000010000000500000000A3626F4F284741402A23586044634140FFC13E36284741A8F42ECF4D44634100E74A4A06284741604256135244634140943F531F284741409C39B66344634100A3626F4F284741402A235860446341
27  WTCMUD NO. 7    6.92786595456e+005  3.61805221358e+003  0106000020E50800000100000001030000000100000018000000C0063953CF464741107CA79ED538634100190A95D246474110AE1AA1DA386341C0240A3EDE474741D866062A0439634180FC86331F484741B8A81D80EE386341E03C4D782848474190155C7FC2386341203E3FAAED4747418056C1F598386341C0BE4FB8ED474741B0F4E3B28A386341A08A5238C947474180DBCB047E386341E036F588A6474741389250786438634140FEF57170474741205693EF523863416028CABB3F4747414897B23D503863410081AAA732474741989B78844F386341601450391A474741903F8F535138634140948698EB46474170D55FC754386341A0068BD6D9464741703CF3B85C386341001E6513CC464741E0D4F5E06238634140A9C648CC464741C8048B7274386341A06B007ACC4647415043AEA684386341E07EA77ECC464741D8E2993086386341A00B156BE0464741B845446E91386341008BC1BB15474741C004DF82AF386341C05D1BB7EC464741401E7AF3C0386341C0EF4FB5CB464741B0DF1CFCCE386341C0063953CF464741107CA79ED5386341
...
\.
CREATE INDEX "mud_geom_gist" ON "raw_data"."mud" USING GIST ("geom");
COMMIT;

To sumarize, I am looking for the fastest/easist/robust way to bulk upload any shapefile to postgis in java.

I do not care whether solutions use geotools, as I have found geotools very cumbersome for my very simple needs — changing the name of the geometry column from "the_geom" to "geom", for example, requires generating a brand new schema from scratch with the new geometry name, iterating through all the features of the shape datastore, generating a new feature in the new schema, and then individually mapping each attribute of the old schema to the corresponding attribute in the new schema!

Best Answer

I am not a Java expert but I do know that Java supports subprocesses, so you could do the sort of thing I'd do in Python as follows.

  • Write a simple loop to iterate over your shapefiles
  • For each shapefile, spawn a subprocess to call ogr2ogr to dump the shapefile in one go into PostGIS without having to create a dump file first
  • Rinse and repeat

GDAL/OGR has Java bindings so you can use them for other geoprocessing tasks too. I know it's not GeoTools but ogr2ogr is such a wonderful Swiss-Army-Knife of a tool that no GIS Professional should be without it!

EDIT:
You don't have to install FWToools to get ogr2ogr, you can get it here