To measure distances in Km you need to transform your geometry's coordinates from long/lat (spherical) to x/y (planar). In my example I am using UTM zone 35N, EPSG 32635 - you will have to pick something appropriate based on your data's location.
The query below can be optimized by avoiding the st_transform
call (e.g. by creating an intermediate table with geometry in EPSG 32635).
The other thing that can be improved is that distances are computed as a query over a cartesian product (the table is joined with itself) so you unnecessarily compute both the distance from A to B and the distance from B to A.
select a.outletname, b.outletname,
st_distance(st_transform(a.geom, 32635), st_transform(b.geom, 32635))
from
table_name a,
table_name b,
where a.outletname != b.outletname
and a.outletname in ('A', 'B', 'C', 'D')
and b.outletname in ('A', 'B', 'C', 'D')
and st_distance(st_transform(a.geom, 32635), st_transform(b.geom, 32635)) > 50000
order by st_distance(st_transform(a.geom, 32635), st_transform(b.geom, 32635)) desc;
whuber's idea with the convex hull makes sense when the data set is large enough to prevent computing distances on the cartesian product.
I've checked on my dataset and, as expected, the greatest distance is equal when computed by the two different methods.
For reference, the distances between the convex hull points query:
select st_distance(a.geom, b.geom) from
(select (st_dumppoints(st_convexhull(st_collect(st_transform(way, 32635))))).geom from planet_osm_point a where a.amenity='pharmacy' and st_contains((select way from planet_osm_polygon where name='Sector 1'), a.way)) a,
(select (st_dumppoints(st_convexhull(st_collect(st_transform(way, 32635))))).geom from planet_osm_point a where a.amenity='pharmacy' and st_contains((select way from planet_osm_polygon where name='Sector 1'), a.way)) b
order by st_distance(a.geom, b.geom) desc;
There's an additional problem with the convex hull approach - it works on aggregated geometry hence it anonymizes which point belongs to which feature, making it more difficult to compute outlet names.
The solution would probably be to compute the convex hull and then identify (by spatial queries), which feature geometries belong to the convex hull and run the distance query on those features only.
E.g.:
The geometry of the first point of the convex hull:
osm=# select (st_dumppoints(st_convexhull(st_collect(way)))).geom from planet_osm_point a where a.amenity='pharmacy' limit 1;
geom
----------------------------------------------------
010100002031BF0D00D120685EC2064241D3F3D3FDA9AE5541
(1 row)
The corresponding feature from the table:
osm=# select name, osm_id, way from planet_osm_point a where a.way = '010100002031BF0D00D120685EC2064241D3F3D3FDA9AE5541';
name | osm_id | way
-----------+------------+----------------------------------------------------
Dentafarm | 1395848741 | 010100002031BF0D00D120685EC2064241D3F3D3FDA9AE5541
(1 row)
Read this answer to Nearest facility with pgRouting which is a quite similar case. If it will not then some information that would help is:
- Size of your source and target tables
- Size of your table with road network
- Whether it is OK to find nearest point euclidean and then count road distance
One of resolutions is (pseudo-code):
Create table with distance from every source to every target:
Create table distances_tmp as
SELECT
sources.id as source_id
nearest_node(sources.x, sources.y) as source_node
targets.id as target_id
nearest_node(targets.x, targets.y) as target_node
give_me_cost(sources.x, sources.y,targets.x, targets.y) as cost
FROM
sources
JOIN
targets on 1=1
Create table with only lowest costs
Create table distances as
select *
from distances_tmp d1
where not exists (select 1 from distances_tmp d2 where d2.cost < d1.cost)
I wrote it in two steps cause it's easier to understand it, but there is some ways to join this two queries into one (eg. select from select or WITH clause)
Also consider changes in nearest_node function - nearest vertex of nearest edge is not always best place to start (or stop) routing if nearest edge is one-way street, so your cost will not always be accurate.
Best Answer
If it is just 2000 records and only points (no complex polygons or linestrings) it shouldn't take many milliseconds to calculate.
If it would have been a few million points to check distance between you could consider using knn-distance with PostgreSQL 9.1 and PostGIS 2.0.
But I think you should do fine with something like?
The bottleneck in this query will not be the spatial calculation since it is only three distance calculations between points, but to find the OutletName fast. Because of that you will maybe save a efw milliseconds if you put an index n the OutletName column.
edit:
From Whubers comment I guess I am misunderstanding something here, but I let it be for now.
HTH
Nicklas