views:

140

answers:

4

I am using python and would like a simple api or regex to check for a domain name's validity. By validity I am the syntactical validity and not whether the domain name actually exists on the Internet or not.

+1  A: 

Any domain name is (syntactically) valid if it's a dot-separated list of identifiers, each no longer than 63 characters, and made up of letters, digits and dashes (no underscores).

So:

r'[a-zA-Z\d-]{,63}(\.[a-zA-Z\d-]{,63})*'

would be a start. Of course, these days some non-Ascii characters may be allowed (a very recent development) which changes the parameters a lot -- do you need to deal with that?

Alex Martelli
can an identifier start/end with a hyphen?
Amarghosh
Thanks! No, I don't I need some basic sanity check to ensure that it does not contain any blacklisted characters such as ' ! " etc.
demos
Alex, I know you are an appengine Guru, please help me with this one:http://stackoverflow.com/questions/2894808/creating-auto-incrementing-column-in-google-appengine Thanks in advance!
demos
@Amarghosh, per RFC 1035, yes: but the RFC also says "when assigning a domain name for an object, the prudent user will select a name" that's more prudent than that (and in particular has each identifier, which it calls 'label', start with a letter, and the whole domain name limited to 255 bytes). "Be conservative in what you generate and liberal in what you accept"!-) Since a RE no doubt has to do with "accept", better it be liberal.
Alex Martelli
@demos, I see you got a good answer to that other question (I was asleep by the time you asked it;-).
Alex Martelli
@alex yup :) I have 2 more for you:http://stackoverflow.com/questions/2906908/searching-through-model-relationships-in-google-app-enginehttp://stackoverflow.com/questions/2906746/updating-model-schema-in-google-app-engineThanks!
demos
You got two perfectly correct answers to those two questions, too (even though you apparently don't like them, I can't add anything to those answers).
Alex Martelli
A: 

Seems it is already discussed HERE.

Incognito
A: 
r'^(?=.{4,255}$)([a-zA-Z0-9][a-zA-Z0-9-]{,61}[a-zA-Z0-9]\.)+[a-zA-Z0-9]{2,5}$'
  • Lookahead makes sure that it has a minimum of 4 (a.in) and a maximum of 255 characters
  • One or more labels (separated by periods) of length between 1 to 63, starting and ending with alphanumeric characters, and containing alphanumeric chars and hyphens in the middle.
  • Followed by a top level domain name (whose max length is 5 for museum)
Amarghosh
A: 

Note that while you can do something with regular expressions, the most reliable way to test for valid domain names is to actually try to resolve the name (with socket.getaddrinfo):

from socket import getaddrinfo

result = getaddrinfo("www.google.com", None)
print result[0][4]

Note that technically this can leave you open to DoS (if someone submits thousands of invalid domain names, it can take a while to resolve invalid names) but you could simply rate-limit someone who tries this.

The advantage of this is that it'll catch "hotmail.con" as invalid (instead of "hotmail.com", say) whereas a regex would say "hotmail.con" is valid.

Dean Harding