Psycopg2 cursor.copy_expert() BadCopyFileFormat: extra data after last expected column when writing WKB data

binarypostgispythonstringwell-known-binary

I am trying to write a WKB representation (i.e. a binary string) of a Shapely Polygon into a PostgreSQL/PostGIS database using psycopg2 cursor.copy_expert() for a bulk insert.

I narrowed down the issue to the following:

import psycopg2
from io import StringIO
from shapely import wkb
from shapely.geometry import Polygon

conn = psycopg.connect(**db_params)
curs = conn.cursor()

# creation of the tmp_table:
curs.execute(f"""CREATE TABLE IF NOT EXISTS tmp_table (
        id INT NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
        geom_field geometry(PolygonZ, 4326)
    );""")
conn.commit()

shape = np.array([
    [1.60, 1.11, 1.12],
    [1.63, 1.10, 1.16],
    [1.64, 1.14, 1.09]])

polygon_wkb = Polygon(shape).wkb

f = StringIO()
f.write(f"{polygon_wkb}\n")
f.seek(0)

curs.copy_expert(f"COPY tmp_table (geom_field) FROM STDIN WITH CSV DELIMITER ',';", f)
conn.commit()

=>

InvalidTextRepresentation: invalid input syntax for type integer: "b'\x01\x03\x00\x00\x80\x01\x00\x00\x00\x04\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xf9?\xc3\xf5(\\\x8f\xc2\xf1?\xecQ\xb8\x1e\x85\xeb\xf1?\x14\xaeG\xe1z\x14\xfa?\x9a\x99\x99\x99\x99\x99\xf1?\x8f\xc2\xf5(\\\x8f\xf2?=\n\xd7\xa3p=\xfa?=\n\xd7\xa3p=\xf2?q=\n\xd7\xa3p\xf1?\x9a\x99\x99\x99\x99\x99\xf9?\xc3\xf5(\\\x8f\xc2\xf1?\xecQ\xb8\x1e\x85\xeb\xf1?'"
CONTEXT:  COPY tmp_table, line 1, column id: "b'\x01\x03\x00\x00\x80\x01\x00\x00\x00\x04\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xf9?\xc3\xf5(\\\x8f\x..."

So I figured out from this error that I have to replaced f.write(f"{polygon_wkb}\n") by f.write(f"1,{polygon_wkb}\n") but then I face this strange error:

=>

BadCopyFileFormat: extra data after last expected column
CONTEXT:  COPY tmp_table, line 1: "1,b'\x01\x03\x00\x00\x80\x01\x00\x00\x00\x04\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xf9?\xc3\xf5(\\\x8f..."

I cannot see what's wrong here, my table has only two fields:

  • an id as primary key
  • and a geometry(PolygonZ) for storing the WKB geometry.

I've got the same error if I dump the geometry to a WKT representation.

Last but not least, the same error occurs when leaving the id out of the query (because it's an identity field, hence it can be left to PostgreSQL to manage it) by specifying only the geometry field geom_field:

(...)
f.write(f"{polygon_wkb}\n")
(...)
curs.copy_expert(f"COPY tmp_table (geom_field) FROM STDIN WITH CSV DELIMITER ',';", f)

=>

BadCopyFileFormat: extra data after last expected column
CONTEXT:  COPY tmp_table, line 1: "b'\x01\x03\x00\x00\x80\x01\x00\x00\x00\x04\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xf9?\xc3\xf5(\\\x8f\x..."

The WKB string is as follow when I print it in the console:

print(polygon_wkb)
b'\x01\x03\x00\x00\x80\x01\x00\x00\x00\x04\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xf9?\xc3\xf5(\\\x8f\xc2\xf1?\xecQ\xb8\x1e\x85\xeb\xf1?\x14\xaeG\xe1z\x14\xfa?\x9a\x99\x99\x99\x99\x99\xf1?\x8f\xc2\xf5(\\\x8f\xf2?=\n\xd7\xa3p=\xfa?=\n\xd7\xa3p=\xf2?q=\n\xd7\xa3p\xf1?\x9a\x99\x99\x99\x99\x99\xf9?\xc3\xf5(\\\x8f\xc2\xf1?\xecQ\xb8\x1e\x85\xeb\xf1?'

it seems to contain \n sequences. Is it possible that this is causing some trouble with the new line interpreter?

How could I fix that?

Info:

Python: 3.10.4 (main, Apr 2 2022, 09:04:19) [GCC 11.2.0]

psycopg2: '2.9.3 (dt dec pq3 ext lo64)'

shapely: '1.8.1.post1'

PG:14 / PostGIS:3.2

Best Answer

The solution was to use .wkb_hex in order to have a plain string (not a binary one, which would have been OK though for a usage by ST_GeomFromWKB()):

import psycopg2
from io import StringIO
from shapely import wkb
from shapely.geometry import Polygon

conn = psycopg.connect(**db_params)
curs = conn.cursor()

# creation of the tmp_table:
curs.execute(f"""CREATE TABLE IF NOT EXISTS tmp_table (
        id INT NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
        geom_field geometry(PolygonZ, 4326)
    );""")
conn.commit()

shape = np.array([
    [1.60, 1.11, 1.12],
    [1.63, 1.10, 1.16],
    [1.64, 1.14, 1.09]])

polygon_wkb = Polygon(shape).wkb_hex # <- here, use .wkb_hex instead of just .wkb

f = StringIO()
f.write(f"{polygon_wkb}\n")
f.seek(0)

curs.copy_expert(f"COPY tmp_table (geom_field) FROM STDIN WITH CSV DELIMITER ',';", f)
conn.commit()

Not very well documented, but here one can see it's a standard string: https://shapely.readthedocs.io/en/stable/manual.html

wkb_hex from shapely doc