PostGIS and Geopandas – Querying by Polygon Boundary

geopandaspostgis

I am really new to PostGIS (and PostgreSQL, for that matter) and I am having trouble figuring out how to query an existing table in a PG database with a polygon. I think I have identified the appropriate PG statement to use (ST_Contains) but it is not clear to me how I create the arguments for the statement from a geopandas dataframe. I'll walk you through a simple example of what I have thus far:

First, I load the relevant libraries:

from sqlalchemy import create_engine
import geopandas as gpd

Then I load the geodataframe I want to use to query the db:

poly = gpd.read_file('polygon.shp')

I have a database called 'my_db' and I create the connection:

dbname='my_db'
engine = create_engine(f'postgresql://user:password@localhost:5432/{dbname}')

This is where I get stuck.

How do I enter the geopandas geometries from "poly" into a format that PostGIS will understand?

sql = f'SELECT * FROM {dbname} WHERE ST_contains(poly[????], poly[????])'
queried_gdf = gpd.read_postgis(sql, engine, geom_col='geometry')

Best Answer

You're mixing and matching in a weird way. You either want to store your spatial data in PostGIS or as a GeoDataFrame. If you want to store the data as a PostGIS table, you can use GeoPandas to import the data into PostGIS, and at that point you have the entire PostgreSQL/PostGIS querying engine at your disposal:

poly = gpd.read_file('polygon.shp')
engine = create_engine(f'postgresql://user:password@localhost:5432/{dbname}')
poly.to_postgis('my_table',engine)

At this stage, you have the data stored natively in a PostGIS table. Your next step depends on what you're trying to do. ST_Contains kind of suggests you have a bounding box or some other spatial feature in mind, so you could store that info in another table (say, 'bbox') and do:

sql = 'SELECT * FROM my_table a INNER JOIN bbox b ON ST_Contains(b.geom,a.geom)'
queried_gdf = gpd.read_postgis(sql, engine)