[GIS] Postgresql INSERT INTO table WHERE

postgispostgresqlpython

I'm working with a large dataset and I prefer to handle it within a postgres database with postgis. The data is looped in python into a long string (or multiple if one string gets too big). I would like to add a lot of data if the data (in my case column "pos") is within a polygon.

The dataset is much bigger, hence this is an example of what I would like to do:

INSERT INTO I_test_test (pos, row_id) 
VALUES(ST_PointFromText('POINT(13.55259726 55.39571705)', 4326), 1, 
       ST_PointFromText('POINT(13.55387947 55.39514755)', 4326), 2) 
WHERE st_intersects(pos, ST_GeomFromText('POLYGON((13.5542 55.3956,13.5543 55.3952,13.5586 55.3949,13.5587 55.3954,13.5542 55.3956))',4326))

I could write an sql question and check every "pos" before I add them into the INSERT INTO, hence since it is 10 000 – 1 000 000 rows that I would like to add it would take forever…

Any suggestions?

Best Answer

Echoing @Freight_Train, get the data into Postgres first, then do query/insert.

I would get the points (or the source data) loaded into Postgres first.

And if possible, get the source of the "extra" data loaded into Postgres.

Then build a table with geometry from those and put a spatial index on the geometry column.

Then you could build 2 INSERT statements: 1 for the data outside your poly. And another for data outside your polygon.

I usually workout my approach with a series of CTEs w/ sample data, then figure out how to load in via Copy: https://stackoverflow.com/questions/2987433/how-to-import-csv-file-data-into-a-postgresql-table. Example below is with a literal CSV, so you'd need to substitute that with whatever format your data is.

with source_point_data as (
    select E'13.55259726,55.39571705,1\n13.55387947,55.39514755,2'::text as csv
)
,extra_source_data as (
    /* Substitute w/ actual data */
    select generate_series as id
    FROM generate_series(1,100)
)
, query_geom as (
    SELECT ST_GeomFromText('POLYGON((13.5542 55.3956,13.5543 55.3952,13.5586 55.3949,13.5587 55.3954,13.5542 55.3956))',4326) as geom   
)
,rows_from_csv as (
    select unnest as val
    FROM 
        (SELECT unnest(string_to_array(csv,E'\n')) FROM source_point_data) t
)
,parsed_rows as (
    select  array_vals[1]::float as y
        ,array_vals[2]::float as x
        ,array_vals[3]::integer as id
    from (
    select string_to_array(val,E',') as array_vals from rows_from_csv) t

)
,construct_geom as (

    SELECT 
        ST_SetSRID(ST_Point(x, y),4326) as geom
        ,id
    FROM parsed_rows    
)
,insert_within_poly as (

    /* JOIN to whatevever extra_source_data is and insert the extra columns */

    /* Insert into temp table or the actual table */
    INSERT INTO tmp_insert_within_poly --(geom,id,extra_col1,extra_col2...)  
    SELECT geom, id
    FROM construct_geom
    WHERE st_intersects(geom, (select geom from query_geom))
    RETURNING * 
)
,insert_outside_poly as (

    /* Insert into temp table or the actual table */
    INSERT INTO tmp_insert_outside_poly
    SELECT geom, id
    FROM construct_geom
    WHERE st_intersects(geom, (select geom from query_geom)) = false
    RETURNING *
)
select 
jsonb_build_object(
    'insert_within_poly', (SELECT array_to_json(array_agg(t)) FROM (SELECT * FROM insert_within_poly) t)
    ,'insert_outside_poly', (SELECT array_to_json(array_agg(t)) FROM (SELECT * FROM insert_outside_poly) t)
)
;