tags:

views:

60

answers:

3

I have a list of domains and I want to sort them based on tld. whats the fastest way to do this?

+5  A: 

Use the key parameter to .sort() to provide a function that can retrieve the proper data to sort by.

import urlparse

def get_tld_from_domain(domain)
    return urlparse.urlparse(domain).netloc.split('.')[-1]

list_of_domains.sort(key=get_tld_from_domain)

# or if you want to make a new list, instead of sorting the old one
sorted_list_of_domains = sorted(list_of_domains, key=get_tld_from_domain)

If you preferred, you could not define the function separately but instead just use a lambda function, but defining it separately can often make your code easier to read, which is always a plus.

Amber
`sorted()` returns list therefore `list()` is redundant here.
J.F. Sebastian
Quite true. I've been working with generator expressions too much lately. :P
Amber
+2  A: 

Also, remember that it is not trivial to get the TLD from a URL. Please check this link on SO. In python you can use the urlparse to parse URLs.

Gangadhar
+1  A: 

As Gangadhar says, it's hard to know definitively which part of the netloc is the tld, but in your case I would modify Amber's code slightly. This will sort on the entire domain, by the last level first, then the second to last level, and so on.

This may be good enough for what you need without needing to refer to external lists

import urlparse

def get_reversed_domain(domain)
    return urlparse.urlparse(domain).netloc.split('.')[::-1]

sorted_list_of_domains = sorted(list_of_domains, key=get_reversed_domain)

Just reread the OP, if the list is already just domains you can simply use

sorted_list_of_domains = sorted(list_of_domains, key=lambda x:x.split('.')[::-1])
gnibbler