ansaurus

Question

Get Root Domain of Link

Answer 1

+5 A:

This might do the trick.

http://docs.python.org/library/urlparse.html

Eli 2009-10-05 18:27:19

Thanks. I knew about this library but for some reason it didn't cross my mind when thinking about this problem.

Gavin Schulz 2009-10-05 18:34:48

Answer 2

A:

Getting the hostname is easy enough using urlparse:

hostname = urlparse.urlparse("http://www.techcrunch.com/").hostname

Getting the "root domain", however, is going to be more problematic, because it isn't defined in a syntactic sense. What's the root domain of "www.theregister.co.uk"? How about networks using default domains? "devbox12" could be a valid hostname.

For the most common cases, however, you can probably handle the former specially and ignore the latter, but aware that it won't 100% accurate.

hostname = urlparse.urlparse(url).hostname.split(".")
hostname = ".".join(len(hostname[-2]) < 4 and hostname[-3:] or hostname[-2:])

This uses the last three parts if the next-to-last part is less than four characters (e.g. ".com.au", ".co.uk") and the last two parts otherwise.

Ben Blank 2009-10-05 18:35:45

Answer 3

A:

This worked for my purposes. I figured I'd share it.

".".join("www.sun.google.com".split(".")[-2:])

Joe J 2010-07-30 06:53:24

ansaurus

tags:

views:

answers:

Get Root Domain of Link

related questions