tags:

views:

35

answers:

2

What are the valid characters of a hostname? This would be something like a networked computer or a web domain.

To put it in context, I am writing a PC game which connects to a remote server; so I have a field for hostname and a field for port. Obviously the port is a number in the Short range, but I need to know what all the possible hostname characters are (and any other pattern that might be required - does a hostname need to start with a letter?).

Examples of hostname include localhost or google.com.

+2  A: 

Checkout this wiki, specifically the section Restrictions on valid host names

Aaron Hathaway
+2  A: 

It depends on whether you process IDNs before or after the IDN toASCII algorithm. (that is, do you see the domain name παράδειγμα.δοκιμή as παράδειγμα.δοκιμή or as xn--hxajbheg2az3al.xn--jxalpdlp

In the latter case - where you are handling IDNs through the punycode, then the old RFC 1123 rules apply:

U+0041 through U+005A (A-Z), U+0061 through U+007A (a-z) case folded as each other, U+0030 through U+0039 (0-9) and U+002D (-). [edit: and U+002E (.) of course; the rules for labels allow the others, with dots between labels, sometimes it's the obvious bits that are easiest to forget]

If you are seeing it in IDN form, the allowed characters are much varied, see http://unicode.org/reports/tr36/idn-chars.html for a handy chart of all valid characters.

Chances are your network code will deal with the punycode, but your display code (or even just passing strings to and from other layers) with the more human-readable form as nobody running a server on the السعودية. domain wants to see their server listed as being on .xn--mgberp4a5d4ar

Jon Hanna
Wow, I've never even seen IDN and Ponycode! At this time I am planning only for English support, as it's just a pet project of mine, but I definitely learned something from your answer!
Ricket
The great thing about the way punycode works, is that below a certain level (the one where you do the networking) it's all ASCII. It's pretty much a way to pretend an ASCII only tech can work with other text (including some English words previously not allowed). As well as giving the whole word an upgrade path, it gives you one too: Build it to just do as RFC 1123 says, and if you later add in IDN support the network code stays the same and you add support for RFC 3490 on top (libraries are available in many languages to help too).
Jon Hanna