[GIS] pgr_createTopology with large datasets

pgroutingrouting

I have a road network with 8.5 million edges but when I run pgr_createTopology it processes edges at a rate of about 1000 edges per second up until 360,000 edges where it slows down and eventually fails with the following error.

server closed the connection unexpectedly
  This probably means the server terminated abnormally
  before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

I am running postgres 10.1 postgis 2.4 pgrouting 2.5. What would like cause it to fail like this? Or is there any way that I could process the network in 100,000 edge chunks?

Best Answer

The following is what I am using. Some of it is specific to our deployment environment since we are using docker and some bash scripts to deploy and set up the server. You could easily get rid of all the argeparse/os.getenv and hardcode the connection if you wanted.

import argparse
from os import getenv
import psycopg2

parser = argparse.ArgumentParser()
parser.add_argument("-H", "--host", help="host location of postgres database", type=str)
parser.add_argument("-U", "--user", help="username to connect to the database", type=str)
parser.add_argument("-d", "--dbname", help="database name", type=str)
parser.add_argument("-p", "--port", help="port to connect to postgres", type=str)
args = parser.parse_args()
password = getenv('POSTGRES_PASSWORD')

conn = psycopg2.connect(
    f"dbname={args.dbname} user={args.user} host={args.host} port={args.port} password={password}"
)
cur = conn.cursor()
print("connected to database")

cur.execute("SELECT MIN(id), MAX(id) FROM ways;")
min_id, max_id = cur.fetchone()
print(f"there are {max_id - min_id + 1} edges to be processed")
cur.close()

interval = 200000
for x in range(min_id, max_id+1, interval):
    cur = conn.cursor()
    cur.execute(
    f"select pgr_createTopology('ways', 0.000001, 'the_geom', 'gid', rows_where:='id>={x} and id<{x+interval}');"
)
    conn.commit()
    x_max = x + interval - 1
    if x_max > max_id:
        x_max = max_id
    print(f"edges {x} - {x_max} have been processed")

cur = conn.cursor()
cur.execute("""ALTER TABLE ways_vertices_pgr
  ADD COLUMN IF NOT EXISTS lat float8,
  ADD COLUMN IF NOT EXISTS lon float8;""")

cur.execute("""UPDATE ways_vertices_pgr
  SET lat = ST_Y(the_geom),
      lon = ST_X(the_geom);""")

conn.commit()
Related Question