tags:

views:

211

answers:

1

I am using getenv("HOME") in C to get the user's home directory in order to read/write a settings file. But is it possible that the home directory filename could contain characters that cannot be represented as an 8 bit char? (for example, unicode or UTF-8 encoded)

Does this differ for various varieties of Linux and *BSD?

Thanks in advance...

+1  A: 

Yes, it is possible that the string could be UTF-8; however, the value of $HOME must then be a valid UTF-8 string and the string will only contain complete valid UTF-8 characters. Note that UTF-8 simply uses most (but not all; it omits 0xC0, 0xC1, 0xF5..0xFF) of the possible 8-bit chararacter values. That means you don't have to worry very much about it unless you want to. In particular, UTF-8 only uses a zero byte to indicate U+0000, which is equivalent to ASCII NUL or '\0' and is encoded in a single byte (value 0).

The conclusion doesn't vary across platforms; different systems may make it more or less difficult to create home directories that need non-ASCII UTF-8 characters.

See also: SO 164430

Jonathan Leffler
Where does this requirement come that $HOME be UTF-8 and not, say, ISO-8859-1? Unix systems in general do not impose any requirement on the character set of filenames, as long as '/' and '\0' mean the same thing as they do in ASCII.
Lars Wirzenius
No requirement that it is UTF-8 - but it could be in UTF-8 and it would present no problems. See also: http://stackoverflow.com/questions/164430/why-is-it-that-utf-8-encoding-is-used-when-interacting-with-a-unix-linux-environm/164447#164447
Jonathan Leffler
Great answer, thanks. My concern was that wide characters would just get coerced to ascii - this happens in Windows if you use SHGetFolderPathA to get the user's home dir.
Peter Hull