PostGIS – Dynamic SQL String Formatting in Python with Psycopg2 for GeoPandas

geopandaspostgispostgresqlpsycopg2

I am trying to develop a relatively generic function in Python that will filter any PostGIS table in a database using an arbitrary mask (a GeoPandas GeoDataFrame) and the ST_Intersects argument, but I am running into a query formatting problem. I am using the psycopg2 library combined with GeoPandas. After establishing a connection with the database I do the following:

cursor = conn.cursor() # make cursor from connection

mask.to_postgis('tmp_boundary', self.engine, if_exists='replace') # adding a mask to the database that will be used to filter "some_table" below

table_name = 'some_table' # this is the name of the table that I want filtered
table_name_w_geom = table_name + '.geometry' # adding geometry column to table name string

base_query = "SELECT * FROM tmp_boundary INNER JOIN {} ON ST_Intersects({},tmp_boundary.geometry);"
query = sql.SQL(base_query).format(sql.Identifier(table_name), sql.Identifier(table_name_w_geom))
query = query.as_string(cursor)

filtered_gdf = gpd.read_postgis(query, engine, geom_col='geometry', crs=4326)

But, I keep getting this error:

sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedColumn) column "some_table.geometry" does not exist
LINE 1: ...p_boundary INNER JOIN "some_table" ON ST_Intersects("some_table....
                                                             ^
[SQL: SELECT * FROM tmp_boundary INNER JOIN "some_table" ON ST_Intersects("some_table.geometry",tmp_boundary.geometry);]
(Background on this error at: https://sqlalche.me/e/14/f405)
python-BaseException

I also get an error if I run this query at a postgresql prompt. But, if I remove the double quotes from the query it works fine from the postgresql prompt, i.e.,:

SELECT * FROM tmp_boundary INNER JOIN some_table ON ST_Intersects(some_table.geometry,tmp_boundary.geometry);

So, it looks like the double quotes around the table_name and table_name_w_geom are the issue, but I do not know how to remove those using psycopg2. What is the appropriate way to get this to format properly using psycopg2 for use in geopandas (or other libraries)?

Best Answer

Your string formatting is resulting in a double-quotation around both the table and field ("some_table.geometry"), which PostgreSQL interprets as a field name. Use string formatting rather than string concatenation. Also, I'm not sure why you're including those SQLAlchemy steps. You should be able to pass the query string directly to GeoPandas.

table_name = 'some_table' # this is the name of the table that I want filtered
geom_field = 'geometry' # adding geometry column to table name string

sql = "SELECT * FROM tmp_boundary INNER JOIN {0}.{1} ON ST_Intersects({0}.{1},tmp_boundary.geometry);".format(table_name,geom_field)

filtered_gdf = gpd.read_postgis(sql, engine, geom_col='geometry', crs=4326)