Following up to Regular expression to match hostname or IP Address? and using Restrictions on valid host names as a reference, what is the most readable, concise way to match/validate a hostname/fqdn (fully qualified domain name) in Python? I've answered with my attempt below, improvements welcome.
Process each DNS label individually by excluding invalid characters and ensuring nonzero length.
def isValidHostname(hostname):
disallowed = re.compile("[^a-zA-Z\d\-]")
return all(map(lambda x: len(x) and not disallowed.search(x), hostname.split(".")))
def isValidHostname(hostname):
if len(hostname) > 255:
return False
if hostname[-1:] == ".":
hostname = hostname[:-1] # strip exactly one dot from the right, if present
allowed = re.compile("(?!-)[A-Z\d-]{1,63}(?<!-)$", re.IGNORECASE)
return all(allowed.match(x) for x in hostname.split("."))
ensures that each segment
- contains at least one character and a maximum of 63 characters
- consists only of allowed characters
- doesn't begin or end with a hyphen.
It also avoids double negatives (not disallowed
), and if hostname
ends in a .
, that's OK, too. It will (and should) fail if hostname
ends in more than one dot.
If you're looking to validate the name of an existing host, the best way is to try to resolve it. You'll never write a regular expression to provide that level of validation.
I like the thoroughness of Tim Pietzcker's answer, but I prefer to offload some of the logic from regular expressions for readability. Honestly, I had to look up the meaning of those (?
"extension notation" parts. Additionally, I feel the "double-negative" approach is more obvious in that it limits the responsibility of the regular expression to just finding any invalid character. I do like that re.IGNORECASE allows the regex to be shortened.
So here's another shot; it's longer but it reads kind of like prose. I suppose "readable" is somewhat at odds with "concise". I believe all of the validation constraints mentioned in the thread so far are covered:
def isValidHostname(hostname):
if len(hostname) > 255:
return False
if hostname.endswith("."): # A single trailing dot is legal
hostname = hostname[:-1] # strip exactly one dot from the right, if present
disallowed = re.compile("[^A-Z\d-]", re.IGNORECASE)
return all( # Split by labels and verify individually
(label and len(label) <= 63 # length is within proper range
and not label.startswith("-") and not label.endswith("-") # no bordering hyphens
and not disallowed.search(label)) # contains only legal characters
for label in hostname.split("."))