ansaurus

Question

Answer 1

A:

Process each DNS label individually by excluding invalid characters and ensuring nonzero length.


def isValidHostname(hostname):
    disallowed = re.compile("[^a-zA-Z\d\-]")
    return all(map(lambda x: len(x) and not disallowed.search(x), hostname.split(".")))

kostmo 2010-03-28 06:01:46

`return all(x and not disallowed.search(x) for x in hostname.split("."))`

Roger Pate 2010-03-28 06:44:01

A trailing `.` on the end of a hostname is valid. Oh, and much more work to do if you want to support IDN, of course...

bobince 2010-03-28 11:10:21

Answer 2

+5 A:

def isValidHostname(hostname):
    if len(hostname) > 255:
        return False
    if hostname[-1:] == ".":
        hostname = hostname[:-1] # strip exactly one dot from the right, if present
    allowed = re.compile("(?!-)[A-Z\d-]{1,63}(?<!-)$", re.IGNORECASE)
    return all(allowed.match(x) for x in hostname.split("."))

ensures that each segment

contains at least one character and a maximum of 63 characters
consists only of allowed characters
doesn't begin or end with a hyphen.

It also avoids double negatives (not disallowed), and if hostname ends in a ., that's OK, too. It will (and should) fail if hostname ends in more than one dot.

Tim Pietzcker 2010-03-28 08:52:51

Hostname labels should also not end with a hyphen.

bobince 2010-03-28 11:09:57

Right, thanks. Edited my answer.

Tim Pietzcker 2010-03-28 11:48:43

You're using `re.match` incorrectly - mind that `re.match("a+", "ab")` is a match whereas `re.match("a+$", "ab")` isn't. Your function also does not allow for a single dot at the end of the hostname.

AndiDog 2010-03-28 12:02:48

I had been under the impression that `re.match` needs to match the entire string, therefore making the end-of-string anchor unnecessary. But as I now found out (thanks!) it only binds the match to the start of the string. I corrected my regex accordingly. I don't get your second point, however. Is it legal to end a hostname in a dot? The Wikipedia article linked in the question appears to say no.

Tim Pietzcker 2010-03-28 12:22:17

@Tim Pietzcker Yes, a single dot at the end is legal. It marks the name as a fully-qualified domain name, which lets the DNS system know that it shouldn't try appending the local domain to it.

Daniel Stutzbach 2010-03-28 13:16:59

Note that there's also a 63 character limit for each segment. And a global 255 character for the whole hostname.

Romuald Brunet 2010-03-28 13:19:19

Aw shucks. Another edit :)

Tim Pietzcker 2010-03-28 14:30:00

Answer 3

A:

If you're looking to validate the name of an existing host, the best way is to try to resolve it. You'll never write a regular expression to provide that level of validation.

Donal Fellows 2010-03-28 11:51:38

And what if he wants to find out if a hostname that does not yet exist will be a legal one? The RFC appears to be quite straightforward, so I don't see why a regex wouldn't work.

Tim Pietzcker 2010-03-28 12:25:41

Depends on what you're trying to show. If the name doesn't resolve then who knows what it “means”; the true means of validation require information that a regular expression cannot have (i.e., access to DNS). It's easier to just try it and handle the failure. And when thinking about names that are potentially legal but not yet, the only people who actually need to care about that are the registrars. Everyone else should leave these things to the code that is designed to have genuine expertise in the area. As JWZ notes, applying an RE turns a problem into two problems. (Well, mostly…)

Donal Fellows 2010-03-28 14:01:18

i do not agree. there are two separate concerns, and both are valid concerns: (1)°argue whether a given string can serve, technically and plausibly, as a, say, valid email address, hostname, such things; (2)°demonstrate that a given name is taken, or likely free. (1) is purely a syntactical consideration. since (2) happens over the network, there is a modicum of doubt: a host that is up now can be down in a second, a domain i order now can be taken when my mail arrives.

flow 2010-03-28 15:37:53

This approach has been proposed in a similar question (http://stackoverflow.com/questions/399932/can-i-improve-this-regex-check-for-valid-domain-names/401132#401132), and there is even a Python project to facilitate this (http://code.google.com/p/python-public-suffix-list/). I've modified the question title slightly, since I'm not interested in a solution that requires network lookups.

kostmo 2010-03-28 20:29:44

Answer 4

A:

I like the thoroughness of Tim Pietzcker's answer, but I prefer to offload some of the logic from regular expressions for readability. Honestly, I had to look up the meaning of those (? "extension notation" parts. Additionally, I feel the "double-negative" approach is more obvious in that it limits the responsibility of the regular expression to just finding any invalid character. I do like that re.IGNORECASE allows the regex to be shortened.

So here's another shot; it's longer but it reads kind of like prose. I suppose "readable" is somewhat at odds with "concise". I believe all of the validation constraints mentioned in the thread so far are covered:


def isValidHostname(hostname):
    if len(hostname) > 255:
        return False
    if hostname.endswith("."): # A single trailing dot is legal
        hostname = hostname[:-1] # strip exactly one dot from the right, if present
    disallowed = re.compile("[^A-Z\d-]", re.IGNORECASE)
    return all( # Split by labels and verify individually
        (label and len(label) <= 63 # length is within proper range
         and not label.startswith("-") and not label.endswith("-") # no bordering hyphens
         and not disallowed.search(label)) # contains only legal characters
        for label in hostname.split("."))

kostmo 2010-03-28 21:06:51

You don't need the backslashes as line continuators - they are implicit in the enclosing parentheses.

Tim Pietzcker 2010-03-29 06:09:15

good to know. i've removed them.

kostmo 2010-03-29 08:14:44

ansaurus

tags:

views:

answers:

validate hostname string in Python

related questions