Changing Comma to Decimal for CSV Import in PostgreSQL – How to Guide

csvdatabasepostgispostgresql

I try to import a CSV file into a PostgreSQL database that runs on a remote server.
I use psql via SSH.

AIS_DB=# CREATE TABLE bet_srk_2008 (
AIS_DB(# Unique_VesselTraject numeric,
AIS_DB(# AISTrajectPositionTimestampLocal timestamp,
AIS_DB(# RandomVesselAISKey varchar,
AIS_DB(# AISTrajectPositionSpeedOverGround varchar,
AIS_DB(# AISTrajectPositionCourseOverGround varchar,
AIS_DB(# VesselAISLength varchar,
AIS_DB(# VesselAISBeam varchar,
AIS_DB(# VesselAISMaximumDraught varchar,
AIS_DB(# VesselTypeAISDescription varchar,
AIS_DB(# BOZ varchar,
AIS_DB(# AISTrajectPositionLatitude numeric,
AIS_DB(# AISTrajectPositionLongitude numeric,
AIS_DB(# TrajectAISDestination varchar
AIS_DB(# );  

CREATE TABLE

AIS_DB=# \COPY bet_srk_2008 FROM '/home/thecaptain/Export2008.csv' DELIMITERS ';'CSV HEADER;  

ERROR: invalid input syntax for type numeric: "51,2628135681152"
CONTEXT: COPY bet_srk_2008, line 2, column aistrajectpositionlatitude: "51,2628135681152"

I think the problem is that the data uses a comma instead of point as a decimal. Is there a way to circumvent this in the import process?
The datasets have 100 million+ lines and a find and replace is making nano run out of memory.
If I use VARCHAR instead of NUMERIC it won't let me ST_MAKEPOINT it to the geom column.

This is the whole text:

CREATE TABLE bet_srk_2008 (
Unique_VesselTraject numeric,
AISTrajectPositionTimestampLocal timestamp,
RandomVesselAISKey varchar,
AISTrajectPositionSpeedOverGround varchar,
AISTrajectPositionCourseOverGround varchar,
VesselAISLength varchar,
VesselAISBeam varchar,
VesselAISMaximumDraught varchar,
VesselTypeAISDescription varchar,
BOZ varchar,
AISTrajectPositionLatitude numeric,
AISTrajectPositionLongitude numeric,
TrajectAISDestination varchar
);

\COPY bet_srk_2008 FROM '/home/thecaptain/Export2008.csv' DELIMITERS 
';'CSV HEADER;

ALTER TABLE bet_srk_2008 ADD COLUMN geom geometry (POINT, 4326 );

UPDATE bet_srk_2008 SET geom 
ST_SetSRID(ST_MakePoint(AISTrajectPositionLongitude, 
AISTrajectPositionLatitude) ,4326);

CREATE INDEX bet_srk_2008_index ON bet_srk_2008 USING GIST ( geom );

Best Answer

Ok I found a solution for anybody that has the same problem.
I am still interested if somebody has a solution without having to alter the dataset prior to importing.

I use Linux as a operating system on desktop and servers. I changed the comma to point:

sed -i 's/,/./g' Export2008.csv
Related Question