views:

2697

answers:

3

You wouldn't imagine something as basic as opening a file using the C++ standard library for a Windows application was tricky ... but it appears to be. By Unicode here I mean UTF-8, but I can convert to UTF-16 or whatever, the point is getting an ofstream instance from a Unicode filename. Before I hack up my own solution, is there a preferred route here ? Especially a cross-platform one ?

A: 

I this is a duplicate question. See if any of the answers there can help.

Yorgos Pagles
That question is primarily concerned with codepages and only asks about C++/Unicode as a sub-question. Nor are the answers for the secondary question very good. We need a forum for the specific question of standard C++ library with Unicode filenames.
Metashad
+1  A: 

The current versions of Visual C++ the std::basic_fstream have an open() method that take a wchar_t* according to http://msdn.microsoft.com/en-us/library/4dx08bh4.aspx.

John Downey
Will this ultimately / theoretically be portable ?
Metashad
Not all OSs and file systems support Unicode file names so it would not be portable. From what I can gather the wchar_t* open() and constructor on fstream are Microsoft extensions because NTFS does support Unicode file names.
John Downey
or rather, because NTFS uses UTF16 to encode Unicode filenames. Linux supports unicode filenames too, but uses UTF8, so the regular char* version works there
jalf
+8  A: 

The C++ standard library is not Unicode-aware. char and wchar_t are not required to be Unicode encodings.

On Windows, wchar_t is UTF-16, but there's no direct support for UTF-8 in the standard library (the char datatype is not unicode on Windows)

On Windows, a constructor for filestreams is provided which takes a const wchar_t* filename, allowing you to create the stream as:

wchar_t name[] = L"filename.txt";
std::fstream file(name);

However, this overload does not seem to be specified by the standard (it only guarantees the presence of the char* version).

Note that just like char on Windows is not UTF8, on other OS'es wchar_t may not be UTF16. So overall, this isn't likely to be portable. Opening a stream given a wchar_t filename isn't defined according to the standard, and specifying the filename in chars may be difficult because the encoding used by char varies between OS'es.

jalf
What do you mean by "fstream is guaranteed to accept both wchar_t..."? I don't have access to the official 98 standard, but can't find mention of a wchar_t* ctor for basic_fstream in n2857 (ie, IIUC, the current C++0x working draft)
Éric Malenfant
Hmm, looks like you're right. I stand corrected
jalf
@Éric: agreed, the constructors for basic_fstream are defined in 27.8.1.12 of the '03 standard, and there are two: no-args and char*. fstream is a basic_fstream<char>, and there are no additional members defined for that specialization.
Steve Jessop
Edited my answer to reflect this
jalf