ansaurus

Question

Answer 1

+3 A:

If you don't want to use the standard dictionary method (looping through the list incrementing the proper dict. key), you can try this:

>>> from itertools import groupby
>>> myList = words.split() # ['apple', 'banana', 'apple', 'strawberry', 'banana', 'lemon']
>>> [(k, len(list(g))) for k, g in groupby(sorted(myList))]
[('apple', 2), ('banana', 2), ('lemon', 1), ('strawberry', 1)]

It runs in O(n log n) time.

Nick Presta 2009-05-21 15:09:57

Answer 2

+13 A:

defaultdict to the rescue!

from collections import defaultdict

words = "apple banana apple strawberry banana lemon"

d = defaultdict(int)
for word in words.split():
    d[word] += 1

This runs in O(n).

Triptych 2009-05-21 15:10:59

+1, collections.defaultdict is one of my favorite containers!

Alex Martelli 2009-05-21 15:12:25

I'd say O(NlogN) if collection is a tree, or O(N) in average if it's a hash

Drakosha 2009-05-21 15:15:53

dict is a hash.

S.Lott 2009-05-21 15:17:55

or can skip the import statement and use a hash "d = {}"

Demi 2009-05-21 15:31:03

"d = {}" doesn't work, or I'm too noob to do it right.

Daniyar 2009-05-21 15:34:20

no, d={} won't work. The whole point of the defaultdict is that it will automatically initialize values to zero. A standard dict won't do that (you'll get a KeyError exception)

Triptych 2009-05-21 15:41:36

Answer 3

+4 A:

Standard approach:

from collections import defaultdict

words = "apple banana apple strawberry banana lemon"
words = words.split()
result = collections.defaultdict(int)
for word in words:
    result[word] += 1

print result

Groupby oneliner:

from itertools import groupby

words = "apple banana apple strawberry banana lemon"
words = words.split()

result = dict((key, len(list(group))) for key, group in groupby(sorted(words)))
print result

nosklo 2009-05-21 15:11:47

Is there a difference in complexity? Does groupby use sorting? Then it seems to need O(nlogn) time?

Daniyar 2009-05-21 15:27:30

Oops, it seems Nick Presta below has pointed out that the groupby approach uses O(nlogn).

Daniyar 2009-05-21 15:35:48

Answer 4

+13 A:

If you are using python 2.7+/3.1+, there is a Counter Class in the collections module which is purpose built to solve this type of problem:

>>> from collections import Counter
>>> words = "apple banana apple strawberry banana lemon"
>>> freqs = Counter(words.split())
>>> print(freqs)
Counter({'apple': 2, 'banana': 2, 'strawberry': 1, 'lemon': 1})
>>>

Since both 2.7 and 3.1 are still in beta it's unlikely you're using it, so just keep in mind that a standard way of doing this kind of work will soon be readily available.

sykora 2009-05-21 15:16:59

Wow! This is the pythonic way. Thanks for sharing this.

Daniyar 2009-05-21 15:23:05

pretty cool! * Goes back to Python 2.5 :( *

Triptych 2009-05-21 15:27:09

It is also in python 2.7.

nosklo 2009-05-21 17:00:53

Answer 5

A:

Without defaultdict:

words = "apple banana apple strawberry banana lemon"
my_count = {}
for word in words.split():
    try: my_count[word] += 1
    except KeyError: my_count[word] = 1

Thomas Weigel 2009-05-21 15:59:30

Seems slower than defaultdict in my tests

nosklo 2009-05-21 16:59:10

splitting by a space is redundant. Also, you should use the dict.set_default method instead of the try/except.

Triptych 2009-05-21 17:05:23

It's a lot slower because you are using Exceptions. Exceptions are very costly in almost any language. Avoid using them for logic branches.Look at my solution for an almost identical method, but without using Exceptions: http://stackoverflow.com/questions/893417/item-frequency-count-in-python/983434#983434

hopla 2009-06-11 20:30:41

Answer 6

+1 A:

freqs = {}
for word in words:
    freqs[word] = freqs.get(word, 0) + 1 # fetch and increment OR initialize

I think this results to the same as Triptych's solution, but without importing collections. Also a bit like Selinap's solution, but more readable imho. Almost identical to Thomas Weigel's solution, but without using Exceptions.

This could be slower than using defaultdict() from the collections library however. Since the value is fetched, incremented and then assigned again. Instead of just incremented. However using += might do just the same internally.

hopla 2009-06-11 20:21:44

ansaurus

tags:

views:

answers:

item frequency count in python

related questions