[GIS] the most efficient way to create a shapefile from resultset using geotools

geotoolsshapefile

I am facing a problem to write to shapefile (geotools) with respect to time, memory etc.

My workflow is something like reading spacial data from database Oracle with attributes and coordinates (can any point,line,polygon (very large data)) for that i am writing one record to shapefile and then flush the featurecollection so that ditch the outofmemory problem i am using DefaultTransaction. but when i do that it works fine for about 700 features but then the commit function takes more time to write to shapefile what i observed on starting about 2 min and then time keep increasing about 10 to 15 min to 30 mins and keep going… i try different thing like writing chuck in featuer (500) to shapefile but no help. I also observed that the total size of the shapefile writen through geotools is far more then writing shapefile using like arcobject or mapobject java.

here is my code please tell me if there is anyother way to write to shapefile in efficient way…

Transaction transaction = new DefaultTransaction("create");

        String typeName = newDataStore.getTypeNames()[0];
        SimpleFeatureSource featureSource = newDataStore.getFeatureSource(typeName);
        SimpleFeatureStore featureStore = (SimpleFeatureStore) featureSource;
        featureStore.setTransaction(transaction);

        while( _resultSet.next() )
        {
            setRecord();

            SimpleFeature feature = featureBuilder.buildFeature(null);
            featurecollection.add(feature);
            featureStore.addFeatures(featurecollection);

                transaction.commit();
                featurecollection.clear();
        featureCount++;
            if (featureCount % 10 == 0) 
            {
                _logger.info("Rows Processed: " + featureCount);
            }

        }

        reportResults(featureCount);

        transaction.close();
    }

A part from that i also see an amazing thing using geotools when we call transaction.commit() it first copy features in to temp space but the amazing thing is that after about 4000 feature the temp files size is about 3GB. why is that i have no idea.. any comments on this will help me to understand the problem.

i tried it for 75000 points and 8000 lines (each line contains about 200 points). For lines this commit() function is behaving very badly.. by the way this is the mimimum size of data which i am testing using geotools.

Best Answer

I encountered a similar problem to what you describe, and found the source of the problem after some code inspection.

The main problem is that during the addFeatures call, a CopyOnWriteArrayList is used at some point to store all features that are to be written. (This is regardless if you add your features one at a time, or in batches.) This list implementation does a full copy everytime an item is added to it, so for a large amount of features, this causes an enormous overhead.

By browsing through the code, I learned that this only occurs when actually using a Transaction. If you do not specify one (and GeoTools falls back on a Transaction.AUTO_COMMIT mode), another implementation is used.

In short, if you drop all transaction related code, your problem should be solved. In my case, the time for writing around 200k points went from 40s to 13s, and memory usage was significantly better.