views:

294

answers:

1

On a modern Unix or Linux system, how can you tell which code set the /etc/passwd file stores user names in? Are user names allowed to contain accented characters (from the range 0x80..0xFF in, say, ISO 8859-1 or 8859-15)? Can the /etc/passwd file contain UTF-8? Can you tell that it contains UTF-8? What about the plain text of passwords before they are encrypted or hashed?

Clearly, if the usernames and other data is limited to the 0x00..0x7F range (and excludes 0x00 anyway), then there is no difference between UTF-8, 8859-1 or 8859-15; the characters present are all encoded the same.

Also, I'm using /etc/passwd as an abbreviation for something along the lines of "the user identification and authentication database (sometimes termed a directory service) on a Unix-based machine, usually accessed via PAM and sometimes hosted on other machines altogether from the local one, but sometimes still actually a file on the local hard disk, conventionally called /etc/passwd, often supported by /etc/shadow". I'm also assuming that the equivalent questions about the group database (often the /etc/group file) have the same answer.

+1  A: 

It's all ASCII. But the password itself is never stored - only the results of the one-way hash. If you're wondering what characters can be in the password itself, it depends on the locale, which will restrict the characters your terminal is able to deal with. See "man locale"

From the BSD man page:

"/etc/passwd ASCII password file..."

As for usernames, I can tell you that Solaris only supports ASCII. I can't speak for other Unix-en.

"Not every object in Solaris 2 and Solaris 7can have names composed of arbitrary characters. The names of the following objects must be composed of ASCII characters:

* User names, group name, and passwords
* System name ...

"

nont
Can you provide documentary evidence (a URL) for the assertion? And what happens if non-ASCII bytes are entered into the /etc/passwd file?
Jonathan Leffler
Thanks for the extra info. Supposing you have a machine accessed by people from the USA (terminals in 8859-1), Germany (8859-15) and Taiwan (UTF-8). Which code set is the password file stored in now?
Jonathan Leffler
Its still all ASCII. What you can type will vary with the terminal's locale. But the output of the hash function will always be ASCII. For example, you can create a file containing whatever characters you like, and the run md5 on it - and characterset of the original file has nothing to do with the characterset of the resulting hash. (traditionally, the passwd hash is generated by crypt, no md5 though)
nont
OK - I know that the hashed or encrypted password is stored in plain ASCII; that is almost coincidental to my main question - which is what is the user name stored in? What are the constraints on what is allowed? Can 'césar' and 'jürgen' use those names?
Jonathan Leffler
I found the answer for Solaris. Its probably the same for most unix, but I can't say that for sure.
nont
Solaris 10 with 'man -s 4 passwd' includes the words '[t]he password file is an ASCII file that ...'. So, it appears that you are not supposed to work with people who have accents in their names. I'm not sure whether that's a legacy doc defect or an actual restriction.
Jonathan Leffler