views:

300

answers:

3

When we invoke system call in linux like 'open' or stdio function like 'fopen' we must provide a 'const char * filename'. My question is what is the encoding used here? It's utf-8 or ascii or iso8859-x? Does it depend on the system or environment setting?

I know in MS Windows there is a _wopen which accept utf-16.

+6  A: 

It's a byte string, the interpretation is up to the particular filesystem.

Andrew McGregor
+7  A: 

It depends on the system locale. Look at the output of the "locale" command. If the variables end in UTF-8, then your locale is UTF-8. Most modern linuxes will be using UTF-8. Although Andrew is correct that technically it's just a byte string, if you don't match the system locale some programs may not work correctly and it will be impossible to get correct user input, etc. It's best to stick with UTF-8.

Matthew Talbert
Note that it is possible to have files whose names are encoded in other encodings than the system default, for example if you uncompress an archive (tarball, ZIP, etc) that was packed by someone with a different encoding than yours.
alvherre
Indeed, this is very true. Don't we wish that everyone used UTF-8?
Matthew Talbert
+3  A: 

Filesystem calls on Linux are encoding-agnostic, i.e. they do not (need to) know about the particular encoding. As far as they are concerned, the byte-string pointed to by the filename argument is passed down to the filesystem as-is. The filesystem expects that filenames are in the correct encoding (usually UTF-8, as mentioned by Matthew Talbert).

This means that you often don't need to do anything (filenames are treated as opaque byte-strings), but it really depends on where you receive the filename from, and whether you need to manipulate the filename in any way.

JesperE