views:

544

answers:

3

Is there a programatic way to find the domain name from a given hostname?

given -> www.yahoo.co.jp return -> yahoo.co.jp

The approach that works but is very slow is:

split on "." and remove 1 group from the left, join and query an SOA record using dnspython when a valid SOA record is returned, consider that a domain

Is there a cleaner/faster way to do this without using regexps?

+1  A: 

You can use partition instead of split:

>>> 'www.yahoo.co.jp'.partition('.')[2]
'yahoo.co.jp'

This will help with the parsing but obviously won't check if the returned string is a valid domain.

Dave Webb
The string will always be a valid domain, but nothing guarantees it will be a zone.
bortzmeyer
+5  A: 

There's no trivial definition of which "domain name" is the parent of any particular "host name".

Your current method of traversing up the tree until you see an SOA record is actually the most correct.

Technically, what you're doing there is finding a "zone cut", and in the vast majority of cases that will correspond to the point at which the domain was delegated from its TLD.

Any method that relies on mere text parsing of the host name without reference to the DNS is doomed to failure.

Alternatively, make use of the centrally maintained lists of delegation-centric domains from http://publicsuffix.org/, but beware that these lists can be incomplete and/or out of date.

See also this question where all of this has been gone over before...

Alnitak
Can you explain the question and your answer? I'm not sure what's going on.
Unknown
A _zone_ has to have an SOA record, but you can have many levels of "label" beneath a zone. If you've got a.b.c.d.example.com, the only way to know that the actual zone is example.com is to strip off each label in turn until you find an SOA record.
Alnitak
+1  A: 

Your algorithm is the right one. Since zone cuts are not reflected in the domain name (you see domain cuts - the dots - but not zone cuts), it is the only correct one.

An approximate algorithm is to use a list of zones, like the one mentioned by Alnitak. Remember that these static lists are not authoritative, they lack many registries, they are stale, etc.

bortzmeyer