views:

75

answers:

2

I'm looking for a (preferably simple) way to find and order the most common bytes in a python stream element.

e.g.

>>> freq_bytes(b'hello world')
b'lohe wrd'

or even

>>> freq_bytes(b'hello world')
[108,111,104,101,32,119,114,100]

I currently have a function that returns a list in the form list[97] == occurrences of "a". I need that to be sorted.

I figure I basically need to flip the list so list[a] = b --> list[b] = a at the same time removing the repeates.

+5  A: 

Try the Counter class in the collections module.

from collections import Counter

string = "hello world"
print ''.join(char[0] for char in Counter(string).most_common())

Note you need Python 2.7 or later.

Edit: Forgot the most_common() method returned a list of value/count tuples, and used a list comprehension to get just the values.

kindall
+1 and added link to documentation
Adam Bernier
I'm on 3.1, so I don't think that's a problem.
Skyler
+3  A: 
def frequent_bytes(aStr):
    d = {}
    for char in aStr:
        d[char] = d.setdefault(char, 0) + 1

    myList = []
    for char, frequency in d.items():
        myList.append((frequency, char))
    myList.sort(reverse=True)

    return ''.join(myList)

>>> frequent_bytes('hello world')
'lowrhed '

I just tried something obvious. @kindall's answer rocks, though. :)

Mike
+1 for the effort. note that even if you're not using 2.7+ you can use `collections.defaultdict`
Adam Bernier
Aw that's the function I couldn't think of. Thanks!
Mike