I have a list of domains and I want to sort them based on tld. whats the fastest way to do this?
+5
A:
Use the key
parameter to .sort()
to provide a function that can retrieve the proper data to sort by.
import urlparse
def get_tld_from_domain(domain)
return urlparse.urlparse(domain).netloc.split('.')[-1]
list_of_domains.sort(key=get_tld_from_domain)
# or if you want to make a new list, instead of sorting the old one
sorted_list_of_domains = sorted(list_of_domains, key=get_tld_from_domain)
If you preferred, you could not define the function separately but instead just use a lambda
function, but defining it separately can often make your code easier to read, which is always a plus.
Amber
2010-08-24 04:34:30
`sorted()` returns list therefore `list()` is redundant here.
J.F. Sebastian
2010-08-24 07:11:22
Quite true. I've been working with generator expressions too much lately. :P
Amber
2010-08-24 07:49:17
+1
A:
As Gangadhar says, it's hard to know definitively which part of the netloc is the tld, but in your case I would modify Amber's code slightly. This will sort on the entire domain, by the last level first, then the second to last level, and so on.
This may be good enough for what you need without needing to refer to external lists
import urlparse
def get_reversed_domain(domain)
return urlparse.urlparse(domain).netloc.split('.')[::-1]
sorted_list_of_domains = sorted(list_of_domains, key=get_reversed_domain)
Just reread the OP, if the list is already just domains you can simply use
sorted_list_of_domains = sorted(list_of_domains, key=lambda x:x.split('.')[::-1])
gnibbler
2010-08-24 07:05:35