How can I read in a layer's data using the python osgeo
/ogr
library and then export the whole attribute table and an extra column with the geometries coded as WKT to a CSV format?
I know that this is possible using the ogr2ogr
command-line function. The command to run from my terminal would be something like this:
ogr2ogr -overwrite -f CSV -lco GEOMETRY=AS_WKT -skipfailures "path\to\output.csv" "path\to\input.gdb" "input_layer_name"
The command above takes the "input_layer_name"
layer from the input GDB and exports it to a CSV file containing all columns in the original attribute table and another column called WKT
that contains geometries' WKTs.
But I'm not interested in using the ogr2ogr
application. I'm trying to do this from within a Python script. Sadly, using python's subprocess
library to run the command above from inside Python is not an option here.
I've cobbled up a script that almost does what I need, but it still has a few issues:
import os
from osgeo import ogr
# Use OGR specific exceptions
ogr.UseExceptions()
# Definitions for input file name and layer name
inDriverName = 'OpenFileGDB'
inGDBPath = 'path/to/input.gdb'
inLayerName = 'input_layer_name'
inDriver = ogr.GetDriverByName(inDriverName)
inDataSource = inDriver.Open(inGDBPath, 0)
inLayer = inDataSource.GetLayerByName(inLayerName)
inLayerIDColname = inLayer.GetFIDColumn()
# Name of the new CSV
outFile = "path/to/output.csv"
outLayerName = inLayerName + '_wkt'
outWKTColName = 'WKT'
outDriver = ogr.GetDriverByName("CSV")
# Remove output output CSV if it already exists
if os.path.exists(outFile):
outDriver.DeleteDataSource(outFile)
# Create the output CSV
outDataSource = outDriver.CreateDataSource(outFile)
outLayer = outDataSource.CreateLayer(outLayerName)
# Adding ID and WKT fields
idField = ogr.FieldDefn(inLayerIDColname, ogr.OFTInteger)
wktField = ogr.FieldDefn(outWKTColName, ogr.OFTString)
outLayer.CreateField(idField)
outLayer.CreateField(wktField)
outFeatureDefn = outLayer.GetLayerDefn()
# Making sure the feature reader is reset
inLayer.ResetReading()
# Iterating over every feature in the input layer
for this_inFeature in inLayer:
# Extracting the input feature's ID, geometry and WKT
this_FID = this_inFeature.GetFID()
this_inGeom = this_inFeature.GetGeometryRef()
this_inWkt = this_inGeom.ExportToIsoWkt()
# Adding the new ID and WKT data to the output file
outFeature = ogr.Feature(outFeatureDefn)
outFeature.SetField(inLayerIDColname, this_FID)
outFeature.SetField(outWKTColName, this_inWkt)
outLayer.CreateFeature(outFeature)
# Clearing the input Feature
this_inFeature = None
# Releasing the input and output files
inDataSource = None
outDataSource = None
inLayer = None
outLayer = None
I currently see three problems with the method I display above:
- I need to "manually" iterate over every feature and export them one by one, which can be pretty slow. Is there an already-built vectorized or one-line approach for this?
- The output ID field is generated with quotes (see image below). It should be exported as an integer.
- The approach above only exports the ID and WKT fields. How can I make sure all original fields get included alongside the newly-created WKT column?
I know this is a long post, but ultimately, my question is pretty simple. I just want to figure out how read in layers from osgeo
/ogr
and export them to CSV with a WKT column inside Python.
PS: I know that the specific input file doesn't matter much, but here's the input file I'm working with: ZipFile with GDB. The zipfile contains some other stuff too, but I'm focusing on the GDB – specifically, the layer called "TxDOT_Roadway_Linework".
Best Answer
Something like (untested):