[GIS] Queries returning very big datasets in PostGIS

large datasetspostgissql

I have a PostGIS query that will return several million rows:

SELECT 
 t1.id AS id1,
 t2.id AS id2,
 ABS(t1.mean_h - t2.mean_h) AS h_diff, 
 ST_Distance(t1.the_geom, t2.the_geom) AS dist  
FROM tas_ponds as t1, tas_ponds as t2 
WHERE
 (t1.gid > t2.gid) AND
 ST_DWithin(t1.the_geom, t2.the_geom, 17000)

When run in psql, I get an out of memory for query result error.

Googling suggests that this an error within psql rather than postgres/PostGIS. Would amending the query into the form SELECT ... INTO x FROM ... fix the problem? Are there any other recommended approaches for dealing with very large datasets?

Best Answer

Some poking around does confirm this is a Postgres client problem, independent of spatial or server considerations: the client has a limited amount of memory to buffer the results before displaying them on the screen, which you're exceeding.

The recommended approach to handle this is to use a DECLARE / FETCH approach to access the data in smaller blocks than the total result set. You could also create a view with components of the query (e.g. distance) to cut down on the memory needed for the query operation itself.

Related Question