I have a DXF file encoded in ANSI_1252, with object attributes containing umlauts. When I open it in QGIS, the umlauts don't display properly. This apparently happens because QGIS automatically sets the character encoding to UTF-8. In Layer Properties / General the Data source encoding is set to UTF-8 and greyed out, so I can't change it there either. How can I keep the ANSI_1252 encoding?
[GIS] Stop automatic ANSI_1252 to UTF-8 encoding when adding a DXF layer
dxfencodingqgis
Related Solutions
Shapefiles get their codepage either from the .dbf or from the .cpg file.
The .dbf file has a byte that represents DBF Language Driver ID. There's some discussion about these in an archived ArcGIS Desktop forum on forums.esri.com. There's a Microsoft Knowledge Base article Understanding Code Pages in Visual FoxPro which lists 19 DBF Language Driver IDs and their corresponding codepages.
The ArcGIS Resource Center page for Shapefile file extensions states that the .cpg is an optional file that can be used to specify the codepage for identifying the characterset to be used.
In ArcGIS, if a .cpg file is present it will take precedence over the DBF Language Driver ID in the .dbf file. This is generally preferred because the DBF Language Driver ID covers languages supported during the dBASE IV era whereas the .cpg file supports any codepage.
The Moldova shapefiles are using a UTF-8 encoding. You can only specify UTF-8 encoding using the .cpg file. Therefore you will need to create a .cpg text file for each shapefile and place either 65001 or UTF-8 in its body. For your convenience I've included the following MAKECPG.BAT batch file which you can save and run to create the .cpg files:
REM MAKECPG.BAT
ECHO 65001 > moldova_administrative.cpg
ECHO 65001 > moldova_coastline.cpg
ECHO 65001 > moldova_highway.cpg
ECHO 65001 > moldova_location.cpg
ECHO 65001 > moldova_natural.cpg
ECHO 65001 > moldova_poi.cpg
ECHO 65001 > moldova_water.cpg
Quoting from the GDAL documentation for the Esri Shapefile driver:
An attempt is made to read the LDID/codepage setting from the .dbf file and use it to translate string fields to UTF-8 on read, and back when writing. LDID "87 / 0x57" is treated as ISO8859_1 which may not be appropriate. The SHAPE_ENCODING configuration option may be used to override the encoding interpretation of the shapefile with any encoding supported by CPLRecode or to "" to avoid any recoding. (Recoding support is new for GDAL/OGR 1.9.0)
Depending on which GDAL/OGR version you are using, ogr2ogr
may be trying to translate your data to UTF-8 or not doing nothing at all.
So you would do either:
ogr2ogr --config SHAPE_ENCODING "UTF-8" -f "MySql" MySql:"basemap,host=127.0.0.1,user=myUser,password=myPass,port=3306" -lco engine=MYISAM "C:/path/to/data/tl_2012_us_county.shp" -nln county -nlt "geometry" -s_srs EPSG:4269 -t_srs EPSG:3857
or
ogr2ogr --config SHAPE_ENCODING "" -f "MySql" MySql:"basemap,host=127.0.0.1,user=myUser,password=myPass,port=3306" -lco engine=MYISAM "C:/path/to/data/tl_2012_us_county.shp" -nln county -nlt "geometry" -s_srs EPSG:4269 -t_srs EPSG:3857
Finally, check that your MySQL database is using UTF-8 and not Latin-1.
Best Answer
One possible solution to the problem: Open the DXF file in a texteditor and save the file with the encoding that is used by QGIS. In my case I had a DXF file exported from AutoCAD with encoding UTF-8, with Swedish text. I found out that by trial and error that saving the DXF-file with ANSI encoding the text was displayed correctly in QGIS.
I used notepad++ to open the DXF-file and converted the encoding to ANSI and saved the file and then opened the file again in QGIS.