[GIS] Understanding UnicodeDecodeError: ‘utf8’ codec in ArcPy script

arcgis-10.3arcpycursorpythonunicode

I work with arcmap 10.3 and python 2.7.8. I have more than 500 shapefiles that located in many folders and subFolders. All Sub Folders are located in one large directory. I try with arcpy to detect all shapefiles that have in their attribute table ,in field name "YEUD", the value 20. I search all shape files that begin with letters "mig". Finally i tried to print all the shapefiles that had been found with value 20 in it.
When i run this code:

import arcpy,os,fnmatch,unicodedata,codecs

rootPath = r"C:\Project\layers"   
pattern = 'mig*.shp'   
for root, dirs, files in os.walk(rootPath):   
    for filename in fnmatch.filter(files, pattern):   
        shp = os.path.join(root, filename)  
        if arcpy.ListFields(shp, "YEUD"):  
            print("{} has YEUD field".format(shp))   
            with arcpy.da.SearchCursor(shp, ["YEUD"]) as rows:  
                for row in rows:  
                    if row[0] == 52:
                        print("{} has a record with YEUD = wanted row".format(shp))  
                        break

i get an error when the python meet files and folders with right to left font:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe7 in position 23: invalid continuation byte

For completeness, i asked this question in https://geonet.esri.com/message/519769#519769
and marked it as correct answer for files and folders names that written in left to right fonts ,but when i run this code i get an error when the python meet files and folders names with right to left fonts.

In GeoNet i didn't receive helpful answer. I also searched answers in stackOverflow but didn't understand how to unicode the script.

Best Answer

You are probably trying to output Unicode characters into the terminal which does not know them. I'd suggest writing the results to a file instead, so that you can do something like this:

import arcpy,os,fnmatch,unicodedata,codecs

rootPath = r"C:\Project\layers"   
pattern = 'mig*.shp'   

with open('results.log', 'w') as logfile:
  for root, dirs, files in os.walk(rootPath):   
    for filename in fnmatch.filter(files, pattern):   
      shp = os.path.join(root, filename)  
      if arcpy.ListFields(shp, "YEUD"):  
        logfile.write(u"{} has YEUD field\n".format(shp).encode('utf8'))
        with arcpy.da.SearchCursor(shp, ["YEUD"]) as rows:
          for row in rows:  
            if row[0] == 52:
              logfile.write(u"{} has a record with YEUD = wanted row".format(shp).encode('utf8')) 
              break

I don't have any files to test if it works, though. You might want to use a different encoding than utf8 if it does not work (probably your national encoding).

Related Question