[GIS] Dump a GeoJSON FeatureCollection from Spatialite query with Python

geojsonpythonspatialite

Given a Spatialite database, holding a table with a geometry column and some other attribute columns. I want to query that table in a Python script and get a GeoJSON FeatureCollection string in return which stores one GeoJSON Feature per result row, each feature being composed of the geometry from the geometry column and all the other columns as attributes.

Connecting to a SQLite database with Spatialite extensions and doing a query in Python:

import sqlite3

# open database file    
conn = sqlite3.connect('MyDatabase.sqlite')

# load spatialite extensions. make sure that mod_spatialite.dll is somewhere in your system path.
conn.enable_load_extension(True)
conn.execute('SELECT load_extension("mod_spatialite.dll")')

# get a cursor
cursor = conn.cursor()

# compose a query
query = """
SELECT
    geometry,
    firstAttribute,
    secondAttribute,
    thirdAttribute,
FROM
    MyTable
;
"""

# execute query
cursor.execute(query)

# get results
results = cursor.fetchall()

However, results is now a list of tuples (one tuple per row) filled with strings (I never understood why this could be useful ever…), i.e.

[(b'\x00\x01\xe6\x10\x00\x00\xde\xe4\xb7\xe8d\xa9\x95\xbf\xf9\xf5Cl\xb0\xbcI@\xde\xe4\xb7\xe8d\xa9\x95\xbf\xf9\xf5Cl\xb0\xbcI@|\x01\x00\x00\x00\xde\xe4\xb7\xe8d\xa9\x95\xbf\xf9\xf5Cl\xb0\xbcI@\xfe', 'value1', 'value2', 'value3'), ...]

The geometry column is given as binary (as stored in the database), however, getting it as GeoJSON representation would be possible by changing the query from SELECT geometry ... FROM ... to SELECT AsGeoJSON(geometry) ... FROM ....

Still, this is miles away from the format I want to achieve in the end.
Any suggestions how this could be done elegantly without having to split the tuple into smallest parts and recomposing them manually to the desired format?

Best Answer

Okay, got it meanwhile. There are probably different ways to do this but this one works fine. Apart from access to your Spatialite database via Python's sqlite3 module and the Spatialite extension, you'll need the geojson module (simply install with pip install geojson).

For the sake of completeness, let's create a new Spatialite database first and fill it with an example row of data. The comments explain in detail what is going on.

import os
import sqlite3
import geojson

# create a new Spatialite database file (and delete any existing of the same name)
if os.path.exists('MyDatabase.sqlite'):
    os.remove('MyDatabase.sqlite')
conn = sqlite3.connect('MyDatabase.sqlite')

# load spatialite extensions for SQLite.
# on Windows: make sure to have the mod_spatialite.dll somewhere in your system path.
# on Linux: to my knowledge the file is called mod_spatialite.so and also should be located in a directory available in your system path.
conn.enable_load_extension(True)
conn.execute('SELECT load_extension("mod_spatialite.dll")')

# create a new table and fill it with some example values
createTableQuery = """
CREATE TABLE
    MyTable(
            geometry,
            firstAttribute,
            secondAttribute,
            thirdAttribute
            )
;
"""

fillTableQuery = """
INSERT INTO
    MyTable(
            geometry,
            firstAttribute,
            secondAttribute,
            thirdAttribute
            )
VALUES
    (
        GeomFromText('POINT(10 20)', 4326), 15, 'some Text', 'some other Text'
    )
;
"""

conn.execute(createTableQuery)
conn.execute(fillTableQuery)
conn.commit()

Now, the next step is to overwrite the row_factory of Python's sqlite3 module to return query results as lists of dictionaries (with one dictionary per result row) instead of list of tuples which is the default behaviour.

# function that makes query results return lists of dictionaries instead of lists of tuples
def dict_factory(cursor, row):
    d = {}
    for idx,col in enumerate(cursor.description):
        d[col[0]] = row[idx]
    return d

# apply the function to the sqlite3 engine
conn.row_factory = dict_factory

Now, define your desired database query, execute it and fetch the results as a list of dictionaries (one dictionary per results row):

# define your database query
getResultsQuery = """
SELECT
    AsGeoJSON(geometry),
    firstAttribute,
    secondAttribute,
    thirdAttribute
FROM
    MyTable
;
"""

# fetch the results in form of a list of dictionaries
results = conn.execute(getResultsQuery).fetchall()

We will now iterate over the results rows, create a single GeoJSON feature for each row and store it in a predefined list. The geometry column of your database table (or whatever it is called) can be directly returned as a GeoJSON string by putting it as argument to Spatialite's AsGeoJSON function in the query. This GeoJSON string is now part of the returned dictionary of each row and can be accessed via the index AsGeoJSON(geometry) (or whatever the column was called). It can then be given to the loads function of the geojson module which will create a serialized GeoJSON geometry of that string.

After deleting the AsGeoJSON(geometry) entry from the row dictionary, what stays is a dictionary holding all the other column values as key-value pairs except for the geometry. Fortunately, the geojson module can create a GeoJSON Feature by passing a GeoJSON Geometry and a dictionary of properties. Each single feature is then appended to the predefined list.

# create a new list which will store the single GeoJSON features
featureCollection = list()

# iterate through the list of result dictionaries
for row in results:

    # create a single GeoJSON geometry from the geometry column which already contains a GeoJSON string
    geom = geojson.loads(row['AsGeoJSON(geometry)'])

    # remove the geometry field from the current's row's dictionary
    row.pop('AsGeoJSON(geometry)')

    # create a new GeoJSON feature and pass the geometry columns as well as all remaining attributes which are stored in the row dictionary
    feature = geojson.Feature(geometry=geom, properties=row)

    # append the current feature to the list of all features
    featureCollection.append(feature)

Finally, the FeatureCollection constructor of the geojson module accepts a list of single GeoJSON Features to unite them into a FeatureCollection:

# when single features for each row from the database table are created, pass the list to the FeatureCollection constructor which will merge them together into one object
featureCollection = geojson.FeatureCollection(featureCollection)

Finally, dump the created FeatureCollection object into a string:

# print the FeatureCollection as string
GeoJSONFeatureCollectionAsString = geojson.dumps(featureCollection)
print(GeoJSONFeatureCollectionAsString)

That's it! Note that the resulting string will not be pretty printed which I do not mind since I am passing it on to a Leaflet/JavaScript function:

{"type": "FeatureCollection", "features": [{"type": "Feature", "properties": {"firstAttribute": 15, "thirdAttribute": "some other Text", "secondAttribute": "some Text"}, "geometry": {"type": "Point", "coordinates": [10, 20]}, "id": null}]}

If you want it pretty-printed, pass the indent argument to the geojson.dumps method:

# print the FeatureCollection as string
GeoJSONFeatureCollectionAsString = geojson.dumps(featureCollection, indent=2)
print(GeoJSONFeatureCollectionAsString)

which gives:

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "id": null,
      "properties": {
        "thirdAttribute": "some other Text",
        "firstAttribute": 15,
        "secondAttribute": "some Text"
      },
      "geometry": {
        "type": "Point",
        "coordinates": [
          10,
          20
        ]
      }
    }
  ]
}
Related Question