[GIS] Set encoding when using ogr2ogr to export from Oracle db to MapInfo table

convertencodingogr2ogroracle-dbms

I am trying to automate the export of spatial data from a oracle database to mapinfo .tab files. Using this thread I made my script. However I can't see how to set the encoding right.

My data have latin1 encoding. But this is not the case in the output file. I get warnings regarding invalid characters when running the script and Local characters like æøå is not intact in the output.

This is my code:

REM Unsuccessful attempt to change encoding: 
SET PGCLIENTENCODING = LATIN1
SET OGR_FORCE_ASCII = NO

C:\OSGeo4W\bin\ogr2ogr.exe -f "MapInfo File" C:\OUTPUT\PATH\test.TAB OCI:"<USERNAME>/<LOGIN>@(DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = <HOST>)(PORT = <PORT>)))(CONNECT_DATA = (SID =<SID-NUMBER>))):<TABLES> 

Best Answer

The OGR driver for Oracle (OCI driver) accesses the database just like any other client, and so it's behavior regarding character set is the same as for any other client.

In particular, the character set in which the database will send the strings to the client (here ogr2ogr via its OCI plugin) is influenced by the setting of the NLS_LANG environment variable. On windows, that variable (like others) is in the registry. It may be that this variable is incorrectly set up in your environment, or maybe it is missing entirely if you use the light-weight Oracle "thin client".

Local characters like æøå is not intact in the output.

Do you mean they come out as "?" characters ? If so that probably means that you do not have any NLS_LANG set, and so the client library applies the default of US7ASCII (which is unable to encode those characters).

Confirm first that your database is correctly setup with the proper character set. What does the following show ?

SQL> select value from nls_database_parameters where parameter='NLS_CHARACTERSET';

If your database is on Windows and you do have it set up for ISOLatin1 then you should get something like WE8MSWIN1252. If it is on Unix/Linux (also ISOLatin1) then you should see WE8ISO8859P1.

Try the following: before running ogr2ogr set the following:

SET NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252

That will make sure the OCI plugin gets the strings in ISOlatin1. And you then can only hope that ogr proper does not play any tricks with those strings and passes them on to the Mapinfo plugin (hoping too that that also lets them through unadulterated).

PGCLIENTENCODING obviously has no effect: it is only for PostgreSQL

OGR_FORCE_ASCII is also best left out: according to the doc, it will replace all non-ASCII7 characters with "?" - precisely what you do not want. But then again it only applies to XML and derived formats - not your case anyway.