[GIS] Writing GeoJSON stream into Shapefile using GeoTools

geojsongeotoolsshapefile

I've been trying a few solutions to efficiently convert GeoJSON into a
shapefile without having to store all features in memory. I'm using
GeoTools 9.2.

The problem is not so much in how to stream the JSON but how to
efficiently write the features into the shapefile. I use
FeatureJSON#streamFeatureCollection to obtain an iterator. After some
googling, I found 3 different ways of writing a shapefile, namely:

Option 1: Repeatedly calling FeatureStore#addFeatures with a collection containing say 1000 features, within a transaction.

  ListFeatureCollection coll = new ListFeatureCollection(type, features);
  Transaction transaction = new DefaultTransaction("create");
  featureStore.setTransaction(transaction);
  try {
    featureStore.addFeatures(coll);
    transaction.commit();
  } catch (IOException e) {
    transaction.rollback();
    throw new IllegalStateException(
        "Could not write some features to shapefile. Aborting process", e);
  } finally {
    transaction.close();
  }

This option is extremely slow. By profiling a few runs, I noticed that
about 50% of CPU time is spent on the method
ContentFeatureStore#getWriterAppend, presumably in order to reach the
end of the file before each transaction commit.

Option 2: Obtaining an append writer directly from ShapefileDataStore, and
write 1000 features at a time within a transaction.

This option suffers from the same problems as number one.

Option 3: Obtaining a feature writer from ShapefileDataStore, and write one feature at a time using Transaction.AUTO_COMMIT.

 FeatureWriter<SimpleFeatureType, SimpleFeature> writer = shpDataStore
    .getFeatureWriter(shpDataStore.getTypeNames()[0],
        Transaction.AUTO_COMMIT);

 while (jsonIt.hasNext()) {

  SimpleFeature feature = jsonIt.next();
  SimpleFeature toWrite = writer.next();
  for (int i = 0; i < toWrite.getType().getAttributeCount(); i++) {
    String name = toWrite.getType().getDescriptor(i).getLocalName();
    toWrite.setAttribute(name, feature.getAttribute(name));
  }
  writer.write();
}
writer.close();

Option 3 is the fastest, but I feel there would a way of efficiently
adding a greater number of features at a time to the shapefile within
a transaction.

On the other hand, a previous comment in the GeoTools mailing list
noted:

The above would work for mid-sized data transfers, for massive ones
against databases it's better to adopt some sort of batching to avoid
having a single transaction with one million inserts, e.g., insert
1000, commit the transaction, insert another 1000, and so on. This
would work better against databases and against WFS servers, but not
against shapefiles, which instead work better with the massive
insert… to each his own.

Does this mean that the most efficient way of writing to a shapefile
is having all features in memory, rather than being able to append
features?

Best Answer

With help from people in the GeoTools mailing list, I came up with the complete method that takes a JSON InputStream, optionally a known SimpleFeatureType, and a previously created ShapefileDataStore, and returns a shapefile-backed SimpleFeatureSource.

public static SimpleFeatureSource geoJSON2Shp(InputStream input,
        SimpleFeatureType schema, ShapefileDataStore shpDataStore)
        throws IOException {

    FeatureJSON fjson = new FeatureJSON(new GeometryJSON(15));

    SimpleFeatureType featureType = schema;

    if (featureType != null) {
        fjson.setFeatureType(featureType);
    }

    FeatureIterator<SimpleFeature> jsonIt = fjson
            .streamFeatureCollection(input);

    if (!jsonIt.hasNext()) {
        throw new IllegalArgumentException(
                "Cannot create shapefile. GeoJSON stream is empty");
    }

    FeatureWriter<SimpleFeatureType, SimpleFeature> writer = null;

    try {

        // use feature type of first feature, if not supplied
        SimpleFeature firstFeature = jsonIt.next();
        if (featureType == null) {
            featureType = firstFeature.getFeatureType();
        }

        shpDataStore.createSchema(featureType);

        writer = shpDataStore.getFeatureWriterAppend(
                shpDataStore.getTypeNames()[0], Transaction.AUTO_COMMIT);

        addFeature(firstFeature, writer);

        while (jsonIt.hasNext()) {
            SimpleFeature feature = jsonIt.next();
            addFeature(feature, writer);                
        }
    } finally {
        if (writer != null) {
            writer.close();
        }
    }

    return shpDataStore.getFeatureSource(shpDataStore.getTypeNames()[0]);
}


private static void addFeature(SimpleFeature feature,
        FeatureWriter<SimpleFeatureType, SimpleFeature> writer)
        throws IOException {

    SimpleFeature toWrite = writer.next();
    for (int i = 0; i < toWrite.getType().getAttributeCount(); i++) {
        String name = toWrite.getType().getDescriptor(i).getLocalName();
        toWrite.setAttribute(name, feature.getAttribute(name));
    }

    // copy over the user data
    if (feature.getUserData().size() > 0) {
        toWrite.getUserData().putAll(feature.getUserData());
    }

    // perform the write
    writer.write();
}  
Related Question