tags:

views:

956

answers:

3
+1  Q: 

The Hostname Regex

I'm looking for the regex to validate hostnames. It must completely conform to the standard. Right now, I have

^[0-9a-z]([0-9a-z\-]{0,61}[0-9a-z])?(\.[0-9a-z](0-9a-z\-]{0,61}[0-9a-z])?)*$

but it allows successive hypens and hostnames longer than 255 characters. If the perfect regex is impossible, say so.

Edit/Clarification: a Google search didn't reveal that this is a solved (or proven unsolvable) problem. I want to to create the definitive regex so that nobody has to write his own ever. If dialects matter, I want a a version for each one in which this can be done.

A: 

Take a look at the following question. A few of the answers have regex expressions for host names

Could you specify what language you want to use this regex in? Most languages / systems have slightly different regex implementations that will affect people's answers.

JaredPar
I'm using .NET, but I want the regex to be as portable as possible so that other people can use it too.
CannibalSmith
A: 

Your answer was relatively close.

But see

For a hostname RE, that perl module produces

(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z])[.]?)

I would modify to be more accurate as:

(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]{0,61})?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]{0,61}[a-zA-Z0-9]|[a-zA-Z])[.]?)

Optionally anchoring the ends with ^$ to ONLY match hostnames.

I don't think a single RE can accomplish an full validation because, according to Wikipedia, there is a 255 character length restriction which i don't think can be included within that same RE, at least not without a ton of changes, but it's easy enough to just check the length <= 255 before running the RE.

nicerobot
+2  A: 

^(?=.{1,255}$)[0-9A-Za-z](?:(?:[0-9A-Za-z]|\b-){0,61}[0-9A-Za-z])?(?:\.[0-9A-Za-z](?:(?:[0-9A-Za-z]|\b-){0,61}[0-9A-Za-z])?)*\.?$

CannibalSmith
It doesn't accept Domains with trailing "." but otherwise, works.
nicerobot
Fixed. I wonder if the length assertion should check if it's 254 or less excluding the trailing dot instead of just checking if it's 255 or less. Otherwise someone along the line might add the trailing dot to a maximum length hostname and break it.
CannibalSmith