tags:

views:

336

answers:

3

I'm writing some code that processes URLs, and I want to make sure i'm not leaving some strange case out...

Are there any valid characters for a host other than: A-Z, 0-9, "-" and "."?

(This includes anything that can be in subdomains, etc. Esentially, anything between :// and the first /)

Thanks!

+1  A: 

no, that is all that is allowed

here is a reference if you like to read: http://www.ietf.org/rfc/rfc1034.txt

Russ Bradberry
+7  A: 

Please see Restrictions on valid host names:

Hostnames are composed of series of labels concatenated with dots, as are all domain names[1]. For example, "en.wikipedia.org" is a hostname. Each label must be between 1 and 63 characters long, and the entire hostname has a maximum of 255 characters.

RFCs mandate that a hostname's labels may contain only the ASCII letters 'a' through 'z' (case-insensitive), the digits '0' through '9', and the hyphen. Hostname labels cannot begin or end with a hyphen. No other symbols, punctuation characters, or blank spaces are permitted.

Andrew Hare
Thank you! .
Daniel Magliola
Not a problem - glad to help!
Andrew Hare
+1  A: 

Depends at what level you do the validation (before or after the URL escaping). If you try to validate user input, then it can go way beyond ASCII (with big chunks of Unicode).

See http://en.wikipedia.org/wiki/Internationalized_domain_name

If you try to validate after all the escaping and the "punycode" is done, there is no point in validation, since that is already guaranteed to only contain valid characters by the old RFC.

Mihai Nita
Hmmmmm, good point, I need to look into this to see whether it applies to me or not. I'm not exactly sure what you mean by before or after the escaping, and i'm not exactly sure how it applies to my particular situation (which is a bit weird). I'll have to experiment with this, thank you!
Daniel Magliola
Mihai Nita