[GIS] ASCII encoding error when updating field content in SQL Server table

arcgis-10.2arcpyasciisql serverunicode

I know there are plenty of posts and blog explanations about unicode errors, but I still can't figure out to handle it in my particular case. So here is my problem: I am writing a Python script to update records in an SQL Server table, with field content from a shapefile that has been edited in ArcPad.

I use an arcpy.da.UpdateCursor to update existing records.

I've put # -*- coding: cp1252 -*- on top of my script as I work with French characters.
Just doing so, I get the ASCII encoding error:

'ascii' codec can't encode characters in position 0-1: ordinal not in
range(128)

I've tried putting u before the field content:

...
with arcpy.da.UpdateCursor(DBtable, DBFields, where_clause) as DBCur:
    for DBrow in DBCur:
        ...
        DBrow[8] = u"{} - {}: {}".format(DBrow[8], date , AProw[8]) # AProw comes from a SearchCursor reading the shapefile.

I can then print a message with the text content, and it doesn't return an error message, but all characters are replaced (I get something like ???4???5????????>???5??? in my database field).

Any accentuated or punctuation character can be present in the fields (there's a free text comment field) so I don't want to check for every possible non-ASCII character and replace it.

I work with ArcGIS 10.2.2, the shapefile is edited in ArcPad 10.2 and the database is SQL Server 2008 R2.

What am I missing?

EDIT: This only occurs if the destination database is SQL Server. No problem with a file gdb. I have to add that the SQL Server table already contains non-ASCII characters.

Best Answer

Finally I've found a solution by bringing all of the following changes to my code:

use arcpy.ArcSDESQLExecute() instead of arcpy.da Insert/Update cursors to update/populate the table (I had to do both operations and both types of cursors didn't really work in all situations).
use # -*- coding:utf8 -*- in the beginning of the script, which is supposed to be a good practice anyway (# -*- coding: cp1252 -*- allows me to have accents in the messages but doesn't work for editing the database table). Now accents in my messages are replaced, but editing my database works.
there was also an issue with single quotes being recognized as string delimiters within my SQL request. To deal with this I had to replace them with double single quotes.

So now my code looks like this:

new_comment  = "{} - {}: {}".format(DBrow[8].replace("'", "''").encode('utf-8'), date , AProw[8]replace("'", "''").encode('utf-8')) 
# DBrow[8] is the comment already present in the table and AProw[8] is the new comment from my shapefile edited in ArcPad
...
sde_conn = arcpy.ArcSDESQLExecute(DB)  
sql = '''
update {} set {} = '{}'
'''.format(DBtable, DBcomment_field, new_comment)

Hope one day it might help anyone meeting this kind of unicode/SQL Server headache.

Related Solutions

ArcPy – Replacing Non-English Characters in Attribute Tables

I am too quite often dealing with special characters such as you have in Swedish (ä,ö,å), but also some others presenting in other languages such as Portuguese and Spanish (é,í,ú,ó etc.). For instance, I have data where the name of city is written in plain Latin with all the accents removed, so the "Göteborg" becomes "Goteborg" and "Åre" is "Are". In order to perform the joins and match the data I have to replace the accents to the English Latin-based character.

I used to do this as you've shown in your own answer first, but this logic soon became rather cumbersome to maintain. Now I use the unicodedata module which is already available with Python installation and arcpy for iterating the features.

import unicodedata
import arcpy
import os

def strip_accents(s):
   return ''.join(c for c in unicodedata.normalize('NFD', s)
                  if unicodedata.category(c) != 'Mn')

arcpy.env.workspace = r"C:\TempData_processed.gdb"
workspace = arcpy.env.workspace

in_fc = os.path.join(workspace,"FC")
fields = ["Adm_name","Adm_Latin"]
with arcpy.da.UpdateCursor(in_fc,fields) as upd_cursor:
    for row in upd_cursor:
        row[1] = strip_accents(u"{0}".format(row[0]))
        upd_cursor.updateRow(row)

See the link for more information about using the unicodedata module at What is the best way to remove accents in a python unicode string?

Python Shapefile – How to Find and Replace Unicode Character in a Shapefile

GetField returns UTF-8 encoded strings and you'll want to decode it before you process it in any way. Then you encode the result to pass it to SetField. You've got it backwards.

Fiona (shameless plug) deals in Python unicode strings and so is simpler to use.

Unidecode (https://pypi.python.org/pypi/Unidecode) is handy for stuff like this because it will make sensible transliterations and romanizations for many languages. It looks like it would make the ones you want.

>>> from unidecode import unidecode
>>> unidecode(u'\u00c2')
'A'
>>> unidecode(u'\u00C9')
'E'
>>> unidecode(u'\u00C8')
'E'

The example below uses Natural Earth data and converts "Côte d'Ivoire" to "Cote d'Ivoire", etc, without presuming anything about the characters in the source data.

import fiona
from unidecode import unidecode

with fiona.open(
        '/Users/seang/data/ne_50m_admin_0_countries/'
        'ne_50m_admin_0_countries.shp', 'r') as source:

    # Create an output shapefile with the same schema,
    # coordinate systems. ISO-8859-1 encoding.
    with fiona.open(
            '/tmp/transliterated.shp', 'w',
            **source.meta) as sink:

        # Identify all the str type properties.
        str_prop_keys = [
            k for k, v in sink.schema['properties'].items()
                if v.startswith('str')]

        for rec in source:

            # Transliterate and update each of the str properties.
            for key in str_prop_keys:
                val = rec['properties'][key]
                if val:
                    rec['properties'][key] = unidecode(val)

            # Write out the transformed record.
            sink.write(rec)

Best Answer

Related Solutions

ArcPy – Replacing Non-English Characters in Attribute Tables

Python Shapefile – How to Find and Replace Unicode Character in a Shapefile

Related Question