[GIS] Understanding UnicodeDecodeError: ‘utf8’ codec in ArcPy script

arcgis-10.3arcpycursorpythonunicode

I work with arcmap 10.3 and python 2.7.8. I have more than 500 shapefiles that located in many folders and subFolders. All Sub Folders are located in one large directory. I try with arcpy to detect all shapefiles that have in their attribute table ,in field name "YEUD", the value 20. I search all shape files that begin with letters "mig". Finally i tried to print all the shapefiles that had been found with value 20 in it.
When i run this code:

import arcpy,os,fnmatch,unicodedata,codecs

rootPath = r"C:\Project\layers"   
pattern = 'mig*.shp'   
for root, dirs, files in os.walk(rootPath):   
    for filename in fnmatch.filter(files, pattern):   
        shp = os.path.join(root, filename)  
        if arcpy.ListFields(shp, "YEUD"):  
            print("{} has YEUD field".format(shp))   
            with arcpy.da.SearchCursor(shp, ["YEUD"]) as rows:  
                for row in rows:  
                    if row[0] == 52:
                        print("{} has a record with YEUD = wanted row".format(shp))  
                        break

i get an error when the python meet files and folders with right to left font:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe7 in position 23: invalid continuation byte

For completeness, i asked this question in https://geonet.esri.com/message/519769#519769
and marked it as correct answer for files and folders names that written in left to right fonts ,but when i run this code i get an error when the python meet files and folders names with right to left fonts.

In GeoNet i didn't receive helpful answer. I also searched answers in stackOverflow but didn't understand how to unicode the script.

Best Answer

You are probably trying to output Unicode characters into the terminal which does not know them. I'd suggest writing the results to a file instead, so that you can do something like this:

import arcpy,os,fnmatch,unicodedata,codecs

rootPath = r"C:\Project\layers"   
pattern = 'mig*.shp'   

with open('results.log', 'w') as logfile:
  for root, dirs, files in os.walk(rootPath):   
    for filename in fnmatch.filter(files, pattern):   
      shp = os.path.join(root, filename)  
      if arcpy.ListFields(shp, "YEUD"):  
        logfile.write(u"{} has YEUD field\n".format(shp).encode('utf8'))
        with arcpy.da.SearchCursor(shp, ["YEUD"]) as rows:
          for row in rows:  
            if row[0] == 52:
              logfile.write(u"{} has a record with YEUD = wanted row".format(shp).encode('utf8')) 
              break

I don't have any files to test if it works, though. You might want to use a different encoding than utf8 if it does not work (probably your national encoding).

Related Solutions

[GIS] Convert multi KML and KMZ into shapefile using ArcPy

I think you are seeing that particular error message from this line of your code:

arcpy.KMLToLayer_conversion( r"C:\Project\gis\layers" ,r'C:\Project\gis')

arcpy.KMLToLayer_conversion expects a file as its first parameter (KML or KMZ) but you are giving it a folder name.

You could try concatenating the contents of your filename variable, with the appropriate delimiter, onto it.

[GIS] Merge all shp files in a folder into one with a new field populated with the source filename

I'd suggest you rather go with ogr2ogr in a terminal script directly.

In summary (using the syntax from the linked post), to merge all .shp into merged.shp (both in CWD), with the filename (without extension) added as a column, run from within

Bash (Linux):

for file in *.shp
do
  if [ -f  merged.shp ]
    then
      ogr2ogr -f "ESRI Shapefile" merged.shp $file -update -append -dialect "SQLite" -sql "SELECT '${file%.*}' AS filename, * FROM ${file%.*}"
    else
      ogr2ogr -f "ESRI Shapefile" merged.shp $file -dialect "SQLite" -sql "SELECT '${file%.*}' AS filename, * FROM ${file%.*}"
  fi
done

CMD (Windows Command Line):

for %F in (*.shp) do (
  if not exists merged.shp (
    ogr2ogr -f "ESRI Shapefile" merged.shp %F -dialect "SQLite" -sql "SELECT '%~nF' AS filename, * FROM %~nF"
  ) else (
    ogr2ogr -f "ESRI Shapefile" merged.shp %F -update -append -dialect "SQLite" -sql "SELECT '%~nF' AS filename, * FROM %~nF"
  )
)

And to add up to this, there's actually no need to catch the non-existing file case, ogr2ogr will (at least in recent versions) create the file even in -append mode: