MATLAB: Problem opening files containing special characters

fopenMATLABunicodeutfutf-8

It seems there is a problem specifying non-ascii characters in filenames to fopen. Do I need to encode these somehow?
Any help appreciated, dave r.
OSX10.6, English language, Swedish Region
>> feature('DefaultCharacterSet')
ans =
ISO-8859-1
>> getenv('LANG')
ans =
sv_SE.ISO8859-1
Now, suppose I have a file called 'öäå.txt' (which, if we have problems with encoding, are ISO 8859-1 characters 246,228,229 followed by .txt). In Matlab:
I want to open the file:
id = fopen('öäå.txt','r','n','UTF-8') id =
-1
As a workaround for a single file, I can use: >> D = dir('*.txt')
D =
name: 'oÌaÌaÌ.txt'
date: '17-Jun-2011 16:02:36'
bytes: 987
isdir: 0
datenum: 7.3467e+05
>> id = fopen(D.name,'r','n','UTF-8')
id =
3
but I would like a solution where I can actually specify the filename directly!

Best Answer

So following Walter's hint that this is actually UTF-8, we find that filenames on Macs are returned as decomposed form, whereas other systems use composed forms (or perhaps whatever was given by the user). http://download.oracle.com/javase/6/docs/api/java/text/Normalizer.html
I didn't find any way to handle this in matlab native, but Java provides the required methods:
%% some handy definitions
NFD = javaMethod('valueOf', 'java.text.Normalizer$Form','NFD');
NFC = javaMethod('valueOf', 'java.text.Normalizer$Form','NFC');
UTF8=java.nio.charset.Charset.forName('UTF-8');
%% convert a name of a file from dir to a sensible matlab string:
D = dir('*.txt');
s2 = D.name;
s = java.lang.String(uint8(s2),UTF8);
sc = java.text.Normalizer.normalize(s,NFC);
sc = char(sc);
strcmp(sc,'öäå.txt')
ans =
1
%% and the reverse, to open a file with accented characters:
filename = 'öäå.txt';
s = java.lang.String(filename);
sc = java.text.Normalizer.normalize(s,NFD);
bs=single(sc.getBytes(UTF8)');
bs(bs<0) = 256+(bs(bs<0));
id = fopen(char(bs),'r')
id =
3
Related Question